(850) 270-3180 [email protected]

These days if you have the right tools and an idea it’s pretty easy to create a proof of concept project.  Of course by the right tools I mean Python – but after waffling about which cloud service to use it also means AWS for this project.

I’ve been taking the Python for Data Science and Machine Learning Bootcamp over at Udemy.    I recommend it highly if you are interested in that type of thing.   Doing the lessons and exercises got me thinking, of course, about how I might use what I was learning to create something only I was interested in…

What I came up with was the idea of taking a bunch of random text and writing some code that turned it into a totally different conversation.   Creating something new and unique from something that already existed.  So that’s what I did.   to what degree it was successful is subjective.

Ingredients:


The nltk.corpus package should be a blog post of it’s own.  Nltk:

“defines a collection of corpus reader classes, which can be used to access the contents of a diverse set of corpora.”

What this means is that it is a collection of collections of words.  The different collections are sourced from various entities like Hamlet and Twitter…

If you are following along at home, that link up there in the ingredients is the one you want for the nltk.  The biggest pitfall to avoid is making sure that you download the corpus before you go to use it.  Install it like you would any other package and then:

import nltk
nltk.download()

That will get you a download app, get as much of it as you want or just get it all if you want to make sure you don’t miss anything.   I was interested in the Tweets so I did

from nltk.corpus import twitter_samples
import string

#print(twitter_samples.fileids())
#print(twitter_samples.strings('tweets.20150430-223406.json'))
tweets = twitter_samples.strings('positive_tweets.json')

To get some text in a variable called tweets.    I then performing a bunch of tokenization and tagging, figuring out what parts of speech my words were.  I filtered a bunch of stuff out with list comprehension like:

words=[ w for w in tweets if "http" not in w ] # I love list comprehension!

Finally after playing with these positive tweets, getting rid of stuff I didn’t want, playing Python Mad Libs, tagging parts of speech, etc. I came up with some text that would become my conversation.  I created two characters from some words tagged as proper nouns ( Gossip Girl and Whiskers ).

I gradually refined my code and regressed my code until I got them talking to each other.  At this point all that meant is that both characters had lines that they “thought” would work with the other characters lines…  Meaning I was spitting out 2 different list of strings for each character that seemed to relate.

AWS Amazon Polly

Amazon Polly is an example of where AWS really kicks its competitors ass.  Polly is an AWS service that converts text ( like lists of strings ) into (almost) life-like speech.  In this case AWS Polly allowed me to make Gossip Girl and Whiskers have a conversation.

Here is just the audio of the conversation between Gossip Girl and Whiskers:

 

And the full video: