Cataloging my 500+ books

I started reading when I was fairly young, not at child-prodigious levels, but early enough where I was that kid with my nose in a book at adult parties. It was by necessity. As a quiet, only child, I didn’t have many options for entertainment.

I graduated from the beautifully illustrated Ladybird books to every Indian child’s favorite British babysitter and author, Enid Blyton. My collection of books started growing, and I was immensely proud of my little home library.

Begged for a personalized stamp on my birthday

I started charging kids money to borrow books from me, and decided I needed a personalized stamp to legitimize my shady business (see above). In any case, I was fortunate enough to have a study that I shared with my mother for my books. Over time, I graduated from Enid Blyton to a rapid assortment of genres, with a proclivity for young adult fiction to carry me through my pained adolescent years.

Now as a fully-functioning adult with no permanent home, I bring back a suitcase load of books every time I visit home in India. I can’t bring myself to give my books away, it feels like giving away a piece of my heart; so I hoard them. Admittedly, since living in San Francisco, I’ve been taking advantage of the excellent San Francisco Public Library system to borrow books, both physical and digital. I definitely buy fewer books as a result.

Cataloging my books, all 550 of them

Since I fortuitously got stuck at home in India during COVID-19, I decided this would be a great time to start cataloging all the books I have hoarded over the years. For the purpose of this exercise, I decided to exclude everything I read prior to middle school, even though I still have those books.

I needed a way to enter all my books into some type of database. I found that Goodreads allows you to scan books via ISBN, but it was so god-awfully slow, I quickly gave up. After some research, I landed on using a free tool called Libib.com, a home library management system. This allowed for almost instantaneous scanning. Given the number of books I have, this took me several hours spread out over a week or so.

Once I finished, here’s what my library of 539 books looked like.

Screenshot from the Libib dashboard

Metadata associated with each book included

authorstitlepublisherpagesisbndescription
Book metadata

The data wasn’t super clean and I was also really curious about the ratings of the books I was reading. So I downloaded the csv file from Libib and fed it to Goodreads. This way, I got a fairly similar, but cleaner dataset with ratings to boot. I also added to my list books that I had left in my apartment in San Francisco. This gave me a total of about 550 books.

Some interesting findings from looking at the data (thank you Excel)

Longest books

Unsurprisingly, all the fantasy books are pretty dense.

BookAuthorPages
The Rise and Fall of the Third ReichWilliam L. Shirer1264
Oathbringer, The Stormlight ArchiveBrandon Sanderson1243
The Lord of the Rings, #1-3J.R.R. Tolkien1178
Words of Radiance, The Stormlight ArchiveBrandon Sanderson1087
Atlas ShruggedAyn Rand1080
Gone with the WindMargaret Mitchell1011
The Way of Kings, The Stormlight ArchiveBrandon Sanderson1007
Books over a thousand pages

Most popular authors

Considering I bought most of my books between the ages of 12 and 16, I’m not surprised by this list either. These authors were really prolific too in terms of churning books out fairly quickly and regularly. I drew the line at six books by the same author.

AuthorBook Count
Meg Cabot25
Jeffrey Archer14
Agatha Christie11
Sophie Kinsella10
Eoin Colfer8
Megan McCafferty7
Sidney Sheldon7
P.C. Cast7
J.K. Rowling7
Salman Rushdie6
Michael Crichton6
Harlan Coben6
Danielle Steel6
Authors who write a lot!

Distribution of Goodreads Ratings

This is roughly what you would expect, with most books falling in the 3.5 to 4.5 range with a few outliers. I guess I should congratulate myself for reading very average books.

Book Categories

This is the bit that took me the longest since no single API gave me this out of the box. The Goodreads UI does have a field called “Genre” which they very conveniently don’t make available through their public API. I decided to finally use the Google Books API which wasn’t perfect but came the closest. If anyone knows of anything better, let me know!

I had to code a little for this, which made me realize how rusty I am. I used python and a nifty library called pandas which was probably overkill for this. Here’s the code I hacked together (use at your own peril)

Sadly, this only gave me results for about 317 books out of the 550. There were some books that didn’t have a category assigned to them, and some books with ISBNs that didn’t exist in the Google inventory. While this was disappointing, I was still happy I didn’t have to manually categorize any of them, although there was still a significant amount of data cleanup. Obviously categorization can be difficult and subjective, but it seems like the Google Books API could use some improvement in this area, especially for standardization across a wide range.

The results were again as expected. I definitely preferred fiction to non-fiction when I was younger. I also to this day love Indian fiction, we have some of the best writers in the world.

CategoryCount
Fiction125
Young Adult Fiction45
Indian Fiction34
Fantasy fiction33
Detective and mystery stories23
Business & Economics20
Biography & Autobiography13
History11
Comics & Graphic Novels4
Poetry3
Science3
Philosophy3
Total317

Overall, this was a fun exercise for a Sunday afternoon. While none of the trends were particularly surprising, I’m happy I now have a full catalog of my books. I think there’s some interesting things I could do with book categorizations and generating recommendations, but that’s a project for another day.

I’m not attaching the dataset here, but if you have any fun ideas/projects, let me know and I’d be happy to share!

Leave a Comment

Your email address will not be published. Required fields are marked *