I began my survey of the business potential of text mining applications after undertaking market research for a several part series of onsite content blogs. My focus began in the APIs have vast applications in the biomedical field achieved through text mining. Text mining is two words with limits in a way really only limited by language itself. For a creative like myself, I was fascinated with it’s potential and dug as deep as a week of reading got me into the literature. Ranging from the identification of novel drug targets; the identification of the names of drug gang targets in novels; the target names of novel drug use reporting techniques; the potential novel drug functions; parsing specific types of standardized and tested data points in order to determine better models of interactions between the millions of bodily interactions to understand… well things that were never ever possible before. Simply put: the increasing availability of digital medical literature and patient databases means in a way that while the machines can’t “learn” medicine, they can at least report on findings that may have the highest medical value. Yay a type of mining that doesn’t ravage the environment and the lives of the people found around the the ore, oil or other hallowed resource deposit!
Text mining has also been broadly applied to interdisciplinary fields such as government, marketing, social media, history, etc. This is where text mining kept my attention.
General Process for Text Mining and Analysis
What’s broadly referred to as the digital humanities can be described as a set of conceptual and practical approaches to engagement with cultural materials that are aided/mediated by computer applications. Some of these can be quite simple, such as Google Ngram, which can track word usage and changes in meaning over time. While those within academia are careful with their methodological approaches as they expect a high level of scrutiny from the peer-reviewed journals that they hope to get publication in, most outlets for news and cultural content are often less discerning – hence leaving me to the belief that the article from websites lauding it’s useful in marketing is over rated.
For business applications and tracking – yes absolutely. In that way it’s been invaluable to me. And that’s the point, that’s all it’s ought to be – there’s other things out there. So you content marketers emailing me asking if they should invest time in it – I’d say not. But also clarify that it’s not also important to at least be familiar with. I’ve produced several works using R as a component that I’m pleased with – however – these were short and sweet little works designed just for a short smirk and a share. It could be used for a work with greater complexity to the content, but so few threaded Tableau’s really arrest one’s attention.
For example: Articles like this analysis of marijuana talk on Reddit. It is, at least to me, insightfulless. Can I have the time I just spent reading this back please? The same goes for this this analysis of Donald Trump’s Twitter. Doubly so when compared to something like this New York Times analysis and categorization of Trump’s tweets by people! It’s not just the lack of a genuinely noteworthy findings that I take issue with, but also how many have no mooring to acceptable social science principles. It’s literally spreading stupid.
The methodology of this article’s “psychological analysis” on the personality of CEOs is incredibly flawed due to it’s arbitrariness and the nature of public speech. Articles like this one, which uses a sentiment analysis of geo-tagged Twitter posts, declare that “Florida is the 5th happiest state,” despite the fact that a study with actual social scientific rigor, like this one by Gallup, show that this is false. To me, this is all a testament as to how low publishing standards have dropped – when time saving relationships in the form of marketing brand and brain fodder takes the place of . Lest I seem like some partisan crusader, note I’m really just using this as a means of marketing to potential clients as to the standards I take in a non-legalese manner so as to get a better vibe on me. Irony! Anyways, my discerning reader, I will continue…
Qualitative and Quantitative Analysis: Examining Author’s Notebooks
If th are as bad as I say, why would they get published? Well for one, the producers like it as 99% of the time it is cheaper than taking a different approach. Because of the pseudo-scientific trappings, it allow publishers to justify posting whatever B.S. conclusions are given.
This is not to say that *all* text analysis done by API’s are valueless or mystifying instead of insightful – I find this project on screenplays to be fascinating and insightful – just to state that at least most of the ones that I’ve seen promoted via new media outlets are.
This article on Large-Scale Computerized Text Analysis in Political Science: Opportunities and Challenges is focused on a particular sub-branch of Political Science that I read in graduate school masterfully delves into my reservations in greater detail than I will here.
What Machine Learning May Be Able To Tell Us About Creativity
What I do see as valid from a social science perspective and genuinely excites me is the future prospect of being able to use textual analysis to examine the process of creative writers. One of the most rewarding projects that I took as an undergraduate was an extensive study of Dostoyevsky’s writings, which gave me an deep insight to how he worked. It’s one of the reasons why I think it would be cool to be able to cross reference news stories with changes made – however this has some issues as well as there’s no way of ascertaining sans interview with the author on whether or not they were apprised of a particular issue.
A less epistemologically precarious form of analysis on the creative process would require that word processors keep constant documentation of a number of data points. Not only would it be cool to know at what scenes, George R. R. Martin had difficulty continuing a certain story or how and if characters in J. K. Rowling Harry Potter series – it would also be scientifically rigorous. Combining human and computer categorizations would allow for honest findings on a number of other issues as well.
People could quantify if there is a connection between the time it takes to complete writing a scene and the type of scene that it is, the characters that are in it, etc. We could map how a characters personality changes during the revision process and chart the variances in time taken on each character. Presuming the word processing application was granted geolocation data – perhaps it could be determined that there are certain locations where authors are more productive (coffee shops, library, home, etc.) as well as times of the day. These are just some of the many ideas for what computer applications may be able to tell about the creative process.
Using OCR, APIs, text mining and machine learning to be able to examine authors that used physical journals would be more difficult, but would also being productive towards make conclusions about the creative process of writers. I say this as speaking from my own experience, I occasionally write long passages in my physical daily journal when I feel stuck on a certain piece of writing.
Lastly, I just want to throw out how I would love if someone bought me a copy of Henry Miller’s The Nightmare Notebook. I read it while an undergraduate, but would love to have my own.