Saturday, August 07, 2010
Google Knows Just How Many Books There Are?
It has been interesting to read a Google books blog by Leonid Taycher ‘Books of the world a stand up and be counted’. In it he asks the age old question ‘Just how many books are out there?’ Some would simply respond, ‘who cares?’ Who but the most anal of creatures, or Google, would want to have the precise figure let alone it to the nearest ten?
What is interesting is that in their quest for their Holy Grail they have stumbled across revelations most knew but ignored. They clearly want to capture all renditions and editions of a work ‘as we would like to distinguish between -- and scan -- books containing, for example, different forewords and commentaries.’ They also appear to like the definition “tome” which appears to merely mean rendition, but they like it.
They have discovered that an ISBN may be assigned to anything from CDs to bookmarks to t-shirts and have been so diligent to even have identified one assigned to a turkey prod and 1,000 to tee shirts.
So what is their final number today? When they have checked all covers, for what they refer to as ‘clusters’, the answer drops, from an initial 600 million, to a figure around 210 million. They then must then exclude those tee shirts, audio, videos, microfilm and that turkey prod! The answer then becomes some 146 million. They then touch on the world of serials which they estimate at 16 million, but cautiously note that the number is likely to rise, ‘ as our disambiguating algorithms become smarter’.
The answer today is 129,864,880! We note they went from rounded millions to the nearest ten!
Well we now know yet another piece of useless information and how Google determines its targets. However, we wonder with this level of diligence, why they still could determine the number of orphan works. No doubt they have an algorithm and some army of people assigned to that too!