petrushkagoogol Posted January 18, 2018 Report Posted January 18, 2018 Can we write a program to automate book previews ? Here is how I would go about it - 1. Scan the book and store in pdf format. 2. Convert the pdf into doc format. 3. Create hashtables of verbs, adverbs, pronouns, interjections, conjunctions. 4. Identify the nouns and rank person names according to frequency. (by exclusion of 1-3). Ranking will help to identify major and minor characters. 5. Create a hashtable of country names to identify the setting. 6. Scan index to search for keywords that help identify the genre. 7. Check the author to see if he / she is a known celebrity. (This data could be collated in a hashtable). 8. Search for foul language to see if the book is ok for kids. 9. Check whether translations are available. 10. Check the timeline - minimum year to maximum year. 11. Check for existing reviews. 12. Collate the data into sentences that are readable. For instance - * The genre of the book is spy-thriller. * The main characters are Jack, Jill and Bob. * The story is set in Afghanistan. etc. This is done by using a template and substituting the variables. This will at least reflect key data about the book. Do you think this is feasible ? Quote
exchemist Posted January 18, 2018 Report Posted January 18, 2018 Can we write a program to automate book previews ? Here is how I would go about it - 1. Scan the book and store in pdf format.2. Convert the pdf into doc format.3. Create hashtables of verbs, adverbs, pronouns, interjections, conjunctions.4. Identify the nouns and rank person names according to frequency. (by exclusion of 1-3). Ranking will help to identify major and minor characters.5. Create a hashtable of country names to identify the setting.6. Scan index to search for keywords that help identify the genre.7. Check the author to see if he / she is a known celebrity. (This data could be collated in a hashtable).8. Search for foul language to see if the book is ok for kids.9. Check whether translations are available.10. Check the timeline - minimum year to maximum year.11. Check for existing reviews.12. Collate the data into sentences that are readable. For instance - * The genre of the book is spy-thriller.* The main characters are Jack, Jill and Bob.* The story is set in Afghanistan. etc. This is done by using a template and substituting the variables. This will at least reflect key data about the book. Do you think this is feasible ? No. Next silly question.......... Quote
sanctus Posted January 18, 2018 Report Posted January 18, 2018 Exchemist?This is an application of NLP (=Natural Language Processing). I mean I was writing recomendation engines some based on how similar articles are, so there you fish out the defining words (eg. frequent in current article but not frequent in all the articles) and ifthere are enough matches you say the articles are similar.Petrush's idea is similar but easier for 4 you'd use some TFIDF (term frequency times inverse document frequency) variation since you only have a corpus of 1 document.For 6 some sentiment-analysis equivalent neural network to define the genreetc. Quote
exchemist Posted January 18, 2018 Report Posted January 18, 2018 Exchemist?This is an application of NLP (=Natural Language Processing). I mean I was writing recomendation engines some based on how similar articles are, so there you fish out the defining words (eg. frequent in current article but not frequent in all the articles) and ifthere are enough matches you say the articles are similar. Petrush's idea is similar but easier for 4 you'd use some TFIDF (term frequency times inverse document frequency) variation since you only have a corpus of 1 document.For 6 some sentiment-analysis equivalent neural network to define the genre etc.Classifying a book by genre, or identifying similarities, is not writing a review, which I assume is what was meant by "preview". A review involves someone whose opinion might command a reader's respect giving an opinion on the qualities and demerits of the book, with reference to subject, plot, characters, style etc. Quote
sanctus Posted January 19, 2018 Report Posted January 19, 2018 Ok, I concentrated on the points in the list not the title. And the points given are feasible petrushkagoogol 1 Quote
petrushkagoogol Posted January 19, 2018 Author Report Posted January 19, 2018 Ok, I concentrated on the points in the list not the title. And the points given are feasible I could improve the design by - Moving data from a database where a preview has been generated into an archiveUsing a webservice to get the latest reviewsStoring generated previews as XML to enable efficient storage of data and transfer across firewalls :vava: Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.