Mariana 1 - A Work in Progess
Jan. 19th, 2022 07:56 amWhen I first starting researching Mary Gardens, finding the Mariana List from the John Stokes Collection at the University of Dayton felt like Christmas morning. I suppose only a person who enjoys sorting data would know the thrill of finding a list of things.

The Mariana List details plants associated with Mary by their common religious name and includes their botanical names, non-religious, common names, along with growing regions and sourcing for the data. I've never seen a list of common religious names of plants so complete. Granted, I'm a beginner at this subject so my references are, at this point, internet driven rather than scholarly or primary sources...but you have to start somewhere.
When beginning the project of converting the pdf list to a sortable spreadsheet, I naively thought that I could bust it out in a week or so.... well, a month later, I'm around record 160 because of the immense amount of manual labor involved in cleaning up data.
Here's a quick overview of the conversion steps.
I did not want to type everything out again but copy and paste was a big mess and using the Adobe function to export to word or excel were very messy. I found a free website that would convert pdf to excel and that sort of worked. I was mindful that a number of the sites required logins and emails so I chose the one that didn't require anything even thought it may not have been the best.
BEFORE - PDF

AFTER - EXCEL

Better but not great.
My goal for this iteration of the document is to get all the excel columns matched up to the pdf columns. At this point, you can start to see that the occasional use of quotes and parenthesis arounds the data will ultimately be a problem for sorting, but I realized human error (me!) is a problem once I start cutting and pasting info around. So for Excel Version 1, I'm choosing to not change the data, just line everything up.
The Data gets into a form like this:
Luckily, I found some advanced excel formulas to help sort the data a a little because of all the extra punctuation, playing around with the separated values functions didn't work.
Here is the data in it's end form for Version 1. You can see a duplication of the regions column. The best way I've found to handle this is to use the formula to separate the data and then hide the mashed up column. When I publish online, I'll just omit the hidden column.
It is still time consuming, even with the extra hacks.
Once that is done, the next versions will be to clean up the punctuation around the Religion Names and get the regions into their own columns so they can be sorted.

The Mariana List details plants associated with Mary by their common religious name and includes their botanical names, non-religious, common names, along with growing regions and sourcing for the data. I've never seen a list of common religious names of plants so complete. Granted, I'm a beginner at this subject so my references are, at this point, internet driven rather than scholarly or primary sources...but you have to start somewhere.
When beginning the project of converting the pdf list to a sortable spreadsheet, I naively thought that I could bust it out in a week or so.... well, a month later, I'm around record 160 because of the immense amount of manual labor involved in cleaning up data.
Here's a quick overview of the conversion steps.
I did not want to type everything out again but copy and paste was a big mess and using the Adobe function to export to word or excel were very messy. I found a free website that would convert pdf to excel and that sort of worked. I was mindful that a number of the sites required logins and emails so I chose the one that didn't require anything even thought it may not have been the best.
BEFORE - PDF

AFTER - EXCEL

Better but not great.
My goal for this iteration of the document is to get all the excel columns matched up to the pdf columns. At this point, you can start to see that the occasional use of quotes and parenthesis arounds the data will ultimately be a problem for sorting, but I realized human error (me!) is a problem once I start cutting and pasting info around. So for Excel Version 1, I'm choosing to not change the data, just line everything up.
The Data gets into a form like this:

Luckily, I found some advanced excel formulas to help sort the data a a little because of all the extra punctuation, playing around with the separated values functions didn't work.
Here is the data in it's end form for Version 1. You can see a duplication of the regions column. The best way I've found to handle this is to use the formula to separate the data and then hide the mashed up column. When I publish online, I'll just omit the hidden column.

It is still time consuming, even with the extra hacks.
Once that is done, the next versions will be to clean up the punctuation around the Religion Names and get the regions into their own columns so they can be sorted.
Tags: