The topic of this project was chosen because many people compare the modern rock band Greta Van Fleet to the classic rock band Led Zeppelin. One author writes, "From the moment the Grammy-winning four-piece burst on to the scene, the Michigan rockers [of Greta Van Fleet] have drawn heavy comparison with hard rock’s most iconic and influential band [Led Zeppelin]." We wanted to use XML and Python tools to do a deeper analysis of their music to see if that's true.
"XProc is a declarative programming language with an XML syntax for processing data in pipelines. XProc is based on the concept of pipelines. Pipelines take data as input and generate output data. A pipeline consists of steps which perform various actions on the data. These actions range from reading and writing data, Web requests and complex transformations and validations. XProc is mainly focused on manipulating XML but can handle HTML, JSON, text and binary data as well."
Basically, XProc makes our lives easier and more organized when we have lots of input data and want to transform/output it in multiple steps.
Here is an example of one of our pipeline files (.xpl
file extension, for "XProc PipeLine"). The pipeline contains multiple steps which are explained in detail below.
You can run this pipeline file via command-line using an XProc Processor. There are two open-source options: XML Calabash or MorganaXProc-III. We installed and worked with both in DIGIT 210, but ultimately we used MorganaXProc-III because it ran more efficiently.
Using a step called <p:identity message="[text here]"/>
, you can tell your pipeline to output messages in the command-line as it is running. This helps you understand what the pipeline is doing and when exactly it hits an error. This is especially useful when running a <p:for-each>
loop to process each song in an album. We created a variable called filename
and used $filename
in these messages to track which song file caused an error.
Note: There is one pipeline file for each album, stored with the input data for each album. Ideally, one pipeline file could handle everything, but we chose to use nearly identical pipeline files for each album with only the file paths differing. This was because we couldn’t easily create a single pipeline to read all directories (albums) of songs and output them in a consistent, organized manner. So, each pipeline processes one album at a time.
Our raw text files are chord charts of every song on the first four albums of each artist, and they are organized as such: ../Artist/Album/song-##.txt
.
Our raw-text resources also include the latest five albums of Led Zeppelin's discography, but due to both time constraints and Fleet's discography currently containing four albums, we decided to only process Led Zeppelin's first four albums.
The chord charts we used are in the "chords over lyrics" format, meaning that the chords (roughly) line up with what lyrics they're being played with. We did not check this aspect for accuracy simply because for this project, this complex data would not be analyzed or preserved throughout the pipeline process.
These chord charts were pulled from Ultimate Guitar, an online platform for guitarists and musicians to find user-created learning resources for millions of songs. They offer the ability to download chord charts as PDFs; but for our purposes, copying & pasting into .txt
files was more efficient. The accuracy and formatting consistency of these resources is surprisingly high considering they are all user-created, but it's still ideal to check them manually. @mrs7068 checked a good portion of the songs, but he points out that, while largely inconsequential to the project, there are still many errors throughout the song files.
The MEI gets an honorable mention here because we researched it in the early stages of our project. It is an expansive "open-source effort to define a system for encoding musical documents in a machine-readable structure" (music-encoding.org). We chose to mention it here after the ixml step because the resulting basic XML structure could be transformed into MEI schema-compliant data. Ultimately, given the scope of this project, conforming to MEI's guidelines was unnecessary. But, they're worth checking out if you find our project interesting!
\n(\s*([A-Z][#ba-z/0-9]*) *([A-Z][#ba-z/0-9]*)?)*\n