The topic of this project was chosen because many people compare the modern rock band Greta Van Fleet to the classic rock band Led Zeppelin. One author writes, "From the moment the Grammy-winning four-piece burst on to the scene, the Michigan rockers [of Greta Van Fleet] have drawn heavy comparison with hard rock’s most iconic and influential band [Led Zeppelin]." We wanted to use XML and Python tools to do a deeper analysis of their music to see if that's true.
"XProc is a declarative programming language with an XML syntax for processing data in pipelines. XProc is based on the concept of pipelines. Pipelines take data as input and generate output data. A pipeline consists of steps which perform various actions on the data. These actions range from reading and writing data, Web requests and complex transformations and validations. XProc is mainly focused on manipulating XML but can handle HTML, JSON, text and binary data as well."
Basically, XProc makes our lives easier and more organized when we have lots of input data and want to transform/output it in multiple steps.
Here is an example of one of our pipeline files (.xpl
file extension, for "XProc PipeLine"). The pipeline contains multiple steps which are explained in detail below.
You can run this pipeline file via command-line using an XProc Processor. There are two open-source options: XML Calabash or MorganaXProc-III. We installed and worked with both in DIGIT 210, but ultimately we used MorganaXProc-III because it ran more efficiently.
Using a step called <p:identity message="[text here]"/>
, you can tell your pipeline to output messages in the command-line as it is running. This helps you understand what the pipeline is doing and when exactly it hits an error. This is especially useful when running a <p:for-each>
loop to process each song in an album. We created a variable called filename
and used $filename
in these messages to track which song file caused an error.
Note: There is one pipeline file for each album, stored with the input data for each album. Ideally, one pipeline file could handle everything, but we chose to use nearly identical pipeline files for each album with only the file paths differing. This was because we couldn’t easily create a single pipeline to read all directories (albums) of songs and output them in a consistent, organized manner. So, each pipeline processes one album at a time.
Our raw text files are chord charts of every song on the first four albums of each artist, and they are organized as such: ../Artist/Album/song-##.txt
.
Our raw-text resources also include the latest five albums of Led Zeppelin's discography, but due to both time constraints and Fleet's discography currently containing four albums, we decided to only process Led Zeppelin's first four albums.
The chord charts we used are in the "chords over lyrics" format, meaning that the chords (roughly) line up with what lyrics they're being played with. We did not check this aspect for accuracy simply because for this project, this complex data would not be analyzed or preserved throughout the pipeline process.
These chord charts were pulled from Ultimate Guitar, an online platform for guitarists and musicians to find user-created learning resources for millions of songs. They offer the ability to download chord charts as PDFs; but for our purposes, copying & pasting into .txt
files was more efficient. The accuracy and formatting consistency of these resources is surprisingly high considering they are all user-created, but it's still ideal to check them manually. @mrs7068 checked a good portion of the songs, but he points out that, while largely inconsequential to the project, there are still many errors throughout the song files.
"Invisible XML is a language for describing the implicit structure of data, and a set of technologies for making that structure explicit as XML markup. It allows you to write a declarative description of the format of some text and then leverage that format to represent the text as structured information."
Invisible XML (ixml) is a relatively new language (c. 2022). It allows plain text to be turned into XML by analyzing the "invisible" patterns in the text and creating a set of rules called a "grammar." If you want to learn more, check out Norm Tovey-Walsh's introduction to ixml.
There are lots of ixml processors that also run at the command-line, similar to XProc processors. @mrs7068: When I ran the ixml on it's own, I used Markup Blitz, but I believe my configuration of MorganaXProc-III used Norm Tovey-Walsh's CoffeePot in the pipeline.
The extent of ixml in this project was outlining the different song sections (all of which begin with brackets around the section name: [Section]
), as well as creating root elements for well-formed XML, and using the first four lines of each file as metadata for the song (title
, album
, artist
, key
).
There is a possibility that the texts could have been further encoded using ixml, but given that it is so new and there are limited help resources available, our implementation was enough of a challenge!
This step results in the ixml-output directory.
The MEI gets an honorable mention here because we researched it in the early stages of our project. It is an expansive "open-source effort to define a system for encoding musical documents in a machine-readable structure" (music-encoding.org). We chose to mention it here after the ixml step because the resulting basic XML structure could be transformed into MEI schema-compliant data. Ultimately, given the scope of this project, conforming to MEI's guidelines was unnecessary. But, they're worth checking out if you find our project interesting!
This is pretty basic XSLT with one exception—a monster Regex statement created by @ebeshero:
\n(\s*([A-Z][#ba-z/0-9]*) *([A-Z][#ba-z/0-9]*)?)*\n
This line is used in <xsl:analyze-string>
to differentiate a line of chords from a line of lyrics. It is very impressive and deserves a moment of recognition, as it has not resulted in any noticeable errors (like classifying the word "Oh" to be a chord, even though it technically could qualify). So, after the analyze-string
determines chords or lyrics, it will either tokenize the chords by space(s) and wrap them in an individual element (and wrap all of those in a <chordLine>
element per line of chords/lyrics), or simply tag the lyrics.
This XSLT document is an ever-growing monster. It takes the full XML output and adds an attribute @num
to the <chord>
elements. It only does so if the chord is defined explicitly in the XSLT document, which means all of the possible/needed @num
values had to be created by hand.
What is this mysterious @num
value? It is the number associated with the chord according to the Nashville Number System. According to the Nashville Number System, each chord is assigned a number based on it's position in the scale (or key) that the song uses. For example, take a song in C. The C chord would be assigned the number 1, D would be assigned number 2, and so on. So, using the song's key (found in the metadata), the chords are assigned numbers based on the set defined for that key. There are, of course, chord extensions like "m" for minor, "maj7" for adding the major 7th scale degree, etc. These all needed to be accounted for in the chordsToNumbers.xsl XSLT stylesheet so that those chords with extensions would also receive an accurate @num
value based on the Nashville Numbering System.
So, why take this extra step? Because it makes the most sense for comparing the chords of songs with one another. Sure, the original data can be used to see if there are commonly played chords since there are chords that are easier to play on the guitar. But, in order to analyze chord progressions in a meaningful way, all the songs would have to be in the same key (which they're not) or else you would just be comparing a set of letters with a completely different set of letters with no meaningful connection. Different chord progressions sound different and invoke different emotions. Converting the chords to numbers allows you to see which songs use the happy progression of 1 4 5 1
despite those songs being in different keys.
This and the previous step result in the full-xml-output directory.
It does what it says it does. ¯\\(ツ)/¯
This step results in the plain-lyrics-output directory.
It also does what it says it does. ¯\\(ツ)/¯
This step results in the chord-progressions-output directory.
Looking ahead, we would love to expand this project by analyzing more albums, artists, and components of the music such as genre styles and influences. This would allow us to build on the methods we've developed so far and gain a deeper understanding of recurring themes, lyrical patterns, and storytelling techniques across different genres.
The pipeline we created allows us to easily take a collection of chord charts and prepare them for analysis. The goal is to be able to use the same approach on a wider range of music. By doing so, we’ll be able to see how artists develop their styles and how various cultural influences shape their lyrics over time. We will continue to refine our analysis techniques, exploring new ways to deepen our understanding of how music and lyrics interact to convey meaning.
Ultimately, we want to keep expanding our work and contribute fresh insights into the relationship between music, language, and culture.
To stay updated with the future of our project, check it out on GitHub!