How to Do Sentiment Analysis – Intro to Deep Learning #3


Hello world, it’s Siraj and today we’re
going to use machine learning to help us understand our emotions. Our emotional intelligence distinguishes
us from every other known living being on Earth. These emotions can be simple
like when you get so hyped, all you can hear is
Gasolina by Daddy Yankee. And we’ve invented language to
help us express them to others. But sometimes words are not enough, some emotions have no
direct English translation. For example, in German, [FOREIGN] is
the feeling experience when you’re alone in the woods,
connecting with nature. In Japanese, [FOREIGN] is the awareness
of the impermanence of all things and the gentle sadness at their passing. Emotions are hard to express,
let alone understand, but that’s where AI can help us. And AI can understand us better than we
do analyzing our emotional data to help us make optimal decisions for
goals that we specify, like a personal life coach slash
therapist slash Denzel Washington. But how would it do this? There are generally two main
approaches to Sentiment Analysis. The first one is
the Lexicon Based Approach. We first want to split some given text
into smaller tokens, be that words, phrases or whole sentences. This process is called Tokenization, then we count the number of
times each word shows up. This resulting tally is called
the Bag of Words model. Next we look up the subjectivity of
each word from an existing lexicon, which is a database of emotional values
for words prerecorded by researchers. And once we have those values, we can then compute the overall
subjectivity of our text. The other approach
uses machine learning. If we have a corpus of say, tweets,
that are labeled either positive or negative, we can train
a classifier on it and then given a new tweet, it will classify
it as either positive or negative. So which approach is better? Don’t ask me. No yeah, totally ask me. Well, using a Lexicon is easier, but
the learning approach is more accurate. There are subtleties in language that
Lexicons are bad at, like sarcasm. It seems to be one thing, but it really
means another but deep neural nets can understand these subtleties because
they don’t analyze text at face value. They create abstract representations
of what they learned. These generalizations are called vectors
and we can use them to classify data. Let’s learn more about vectors by
building sentiment classifier for movie reviews and ill show you
how to run it into the cloud. The only dependency we’ll need is
tflearn, and I’m using it since it’s the easiest way to get started
building deep neural networks. We’ll import a couple of helper
functions that are built into it as well and I’ll explain those
when we get to them. The first step in our process
is to collect our data set. tflearn has a a bunch of pre-processed
data sets we can use and we’re going to use a data
set of IMDB movie ratings. [MUSIC] We’ll load it using
the load_data function, this will download our
data set from the web. We’ll name the path where we want to
save it, the extension being pkl, which means it’s a byte stream. This makes it easier to convert to
other Python objects like lists or two pulls later. We want 10,000 words from the database,
and we only want to use 10% of the data for our validation set, so
we’ll set the last argument to 0.1. Load data will return our movie review
split into a training and testing set. We can then further split those
sets into reviews and labels and set then equal to X and Y values. Training data is the portion
our model learns from, validation data is a part
of the training process. While training data helps us fit our
weights, validation data helps prevent over fitting by letting us tune
our hyper parameters accordingly. And testing data is
what our model uses to test itself by comparing its
predictive labels to actual labels. So test yourself before
you wreck yourself. Now that we have our data split into
sets, let’s do some pre-processing. We can’t just feed text strings
into a neural network directly, we have to vectorize our inputs. Neural nets are algorithms that
essentially just apply a series of computations to your matrices. So, converting them to numerical
representations or vectors is necessary. The pad_sequences function will
do that for our view text. It’ll convert each review
into a matrix and pad it. Padding is necessary to ensure
consistency in our inputs dimensionality. It will pad each sequence with a zero
at the end which we specify until it reaches the max possible sequence
length which we’ll set to 100. We also want to convert our
labels to vectors as well and we can easily do that using
the two categorical function. These are binary vectors with two
classes, 1 which is positive or 0 which is negative. Yo hold up. Vectors got me feeling like. [MUSIC] We can intuitively define each layer
in our network as our own line of code. First will be our impro layer, this is
where we feed data into our network. The only perameter we’ll
specify is the input shape. The first element is the batch size,
which we’ll set to none and then the length, which is 100, since
we set our max sequence length to 100. Our next layer is our embedding layer. The first perameter would be
the output vector we receive from the previous layer, and by the way, for every layer we write we’ll be using
the previous layer’s outputs as inputs. This is how data flows
through a neural network, at each layer it’s transformed like
a seven layer dip of computation. We’ll set dimensions to 10,000 since
that’s how many words we loaded from our data set earlier. And the output dimension to 128, which is the number of dimensions
in our resulting embedding’s. Next, we’ll feed those
values to our LSTM layer. This layer allows our network to
remember data from the beginning of the sequences,
which will improve our prediction. We’ll set dropout to 0.08 which
is a technique that helps prevent over fitting by randomly turning on and
off different pathways in our network. Our next layer is fully connected
which means that every neuron in the previous layer is connected
to every neuron in this layer. We have a set of learned feature
vectors from previous layers, and adding a fully connected layer
is a computationally cheap way of learning non-linear
combinations of them. It’s got two units, and it’s using the softmax function
as its activation function. This will take in a vector of values and
squash it into a vector of output probabilities between 0 and
1, that sum to 1. We’ll use those values in our last
layer, which is our regression layer. This will apply a regression
operation to the input. We’re going to specify an optimizer
method that will minimize a given loss function,
as well as the learning rate, which specifies how fast we
want our network to train. The optimizer we’ll use is adam,
which performs gradient descent. And categorical cross entropy is our
loss, it helps to find the difference between our predicted output and
the expected output. After building our neural network,
we can go ahead and initialize it using tflearn’s
deep neural net function. Then we can call our models fit
function, which will launch the training process for
our given training and validation data. We’ll also set show metric to true so we can view the log of
accuracy during training. So to demo this we’re going to
run this in the cloud using AWS. What we’ll do is use a prebuilt
amazon machine image. This AMI can be used to launch an
instance and it’s got every dependency we need built in, including tensor glow,
Buddha, Lil Wayne’s deposition video. If we click on the orange
Continue button, we can select the type
of instance we want. I’ll go for the smallest because
I’m poor still, but ideally, we use a larger instance, with GPUs. Then we can accept
the terms in one click. Next, we go to our AWS console by
clicking this button, and after a while, our instance will start running. We can copy and
paste the public DNS into our browser, followed by, which is the port
we specified for access. For the password,
we’ll use the Instance ID. Now we’re in our Instance environment,
built with our AMI and we can play with a Jupyter Notebook,
hosted on AWS. We’ll create a new notebook,
and paste our code in there. And now we can run it and
it will start training just like that. So to break it down, there are two
main approaches to sentiment analysis, using a Lexicon of pre-recorded
sentiment or using state of the art but more computationally expensive
deep learning to learn generalized vector representation from words. Feedforward net accepts fixed
sized inputs like binary numbers. But recurrent neural nets helps us
learn from sequences data, like texts. And you can use AWS with a pre-built AMI to easily train your models in the cloud
without dealing with dependency issues. The Coding Challenge Winner
from last week is Ludo Bouan. Ludo architected his neural net so
that stacking layers was as easy as a line of code per layer,
Wizard of the Week. And the Runner Up is See Jie Xun, he accurately modified my code to
reflect multilayer back propagation. The Coding Challenge for this week is to
use tflearn to train a neural network to recognize sentiment from a video
game review data set that I’ll provide. Details are in the read me, post your GitHub link the comments, and
I’ll announce the winner in one week. Please click that Subscribe button. If you want to see more videos like
this, check out this related video. And, for now, I gotta figure out what
the F high torch is, so thanks for watching.

100 thoughts on “How to Do Sentiment Analysis – Intro to Deep Learning #3

  • How should one interpret the trainX vector? What do the numbers mean? Is it a bag of words representation/tokenized representation?

  • Siraj, you have an input with 100 values where 0 is empty and other values is a index representation of some word, right? Then you use embedding with a output dimention of 128. I didn't understand the structure of the DNN. Please, tell me if I'm right or not.

    Each value of the input will generate a vector of 128? If this is true, you will have a 12800 input vector to the LSTM. I know that I'm wrong, but can you explain how exactly the input will pass in the DNN?

  • Hi, Siraj! First of all – thank for your videos!
    Can you explain, why you use LSTM for classification of text, represented by same size vectors? As I understand, LSTM need use if text send to input neural net word by word. But in that example text send to neural net entirely.

  • Great flames in the rap 🙂
    Always good to see you adding a new dimension to machine learning… thanks for making ML so much fun

  • I love your videos, I wish they are a little longer to set a better pace. Also I saw your previous video and implemented that textblob polarity thing. It seemed to be not giving right predictions, anyways I want to know that how by default textblob is trained? As i did not train it. And is there any way we implement this one outside of aws and such thing? I know it is a dependency thing but can not we install it directly to our PC. I would be in debt if you answer all my queries.

  • Hey siraj ! I am from New Delhi , India. I have been following your channel recently and I find it very interesting . I am a student of computer science and I plan on doing a college project on sentiment analysis , I tried naive bayes using NLTK but it doesnt seem to be so accurate. Your videos have encouraged me to try this using deep learning (Tensorflow). My main area of confusion include what specific reviews should I take up and how to go about with using tensorflow. I know you'd be getting this daily but please If you could guide me with this , it would be a great help.

  • How do you test the trained model? I tried
    model.load('model.tfl')
    new_sentence = 'The movie was terrible!'
    testdata = pad_sequences(new_sentence, maxlen=100, value=0.)
    prob = model.predict(testdata)
    print(prob)

  • wish someone would remake these deep learning videos but instead go over the conceptual side of it without the annoying fucking quick cuts. They're not funny when you're attempting to understand this, its distracting. It's also fairly obvious by looking at the code that there are VERY import pieces of information that have been intentionally left out for some inexplicable reason.

  • Hi Siraj! You have created great videos on sentiment analysis..do you have a video for beginner? I mean very start from scratch, for example what software, materials/data, and skills we need? I have IT background but not strong in programming

  • mono no aware is some advanced stuff.

    In the Buddhist Vipassana tradition which I've followed, there are 16 stages or knowledges known as nyanas. Mono no aware may be the feeling experienced after the Arising and Passing stage in the Dissolution Nyana (known as Bhanga Nyana in Pali). Some authors have equated it to the Dark night of the soul in the Christian tradition.

    Are you into meditation, Siraj?

  • The price of giving a lot of information, fast, with some cool stuff embedded, is that you are not really teaching but just showing how smart you are. I still find everything you bring valuable and plan to fight with it (and learn eventually). I do realize that you know what you're doing and invest a lot in the material. Thanks. Please consider slowing down and really teaching.

  • Siraj your videos are great man, however I never see any outputs/results after you run the program, why is that?

  • I'm rewatching this again for my project. Haha. Learned a lot in less than 10 minutes. Thank you very much! Ur the best Siraj!

    Btw, I'm wondering what accuracy score did you get from training on imdb dataset. Thanks. 🙂

  • Теперь я знаю, что случилось с Рилигетард, ха-ха. Забавно, но понятно, зачем менять. Сегодня было бы трудно быть общественным деятелем и выражать такие взгляды.

  • Siraj, how do i apply this on image recognition and show the results? I got very interested on this. Thanks a lot

  • Once you have this neural net trained, can it be optimized to run locally, or does it have to be run as is every time? BTW, thanks for the great videos!!

  • For those struggling to understand the data on printing it :
    "The words have been replaced by integers that indicate the absolute popularity of the word in the dataset. The sentences in each review are therefore comprised of a sequence of integers"

    source : http://machinelearningmastery.com/predict-sentiment-movie-reviews-using-deep-learning/

  • Wow I am really impressed, this is the most catchy, informative and just awesome course I have ever seen. Thank you so much!!

  • Hey Siraj, I am getting an error "list index out of range" for the last statement. I tried your code as well just in case I made a typo but I am still getting the same error.
    Thanks for all the videos.

  • Awesome videso. with a lot of replaying, and background research im almost following… However, whats in the imdb database? what are the labels? single words? what o they spell out?

    and the storage format of the descriptions… is it a matrix, with each location having an index to a word? where are the words?

    ive read somewhere something about only frequency of words being stored? there are a lot of unknowns for an AI pleb such as myself…

  • Hi. Love your videos and humor. Curiously, once training is complete, how do you feed a movie review into the model or access the model? (i.e. Where is model saved at the completion of training?)

  • Hi Siraj,

    I think your videos are great but I spotted a small mistake in this one. In the last part you call validation set like this:
    'validation_set(textX, textY)'

    Unfortunately this doesn't work. I did some research and learned that you may need to call it like this:

    'validation_set=(textX,testY)'.

    Not sure if this a versioning problem.

    The other thing is… you go so fast but you don't actually show how to run this or how to actually use it. You quickly move on the AWS but there's no explanation of how I'm supposed to use. How do I supply new text to it? How can I get a prediction out?

    I'd really appreciate some help with this because I'm trying to apply this to real world problems but I can't get an example running at the moment. 🙁

    Thank you for all of the information you have provided though.

    Kiran

  • Just curious, how long did it take y'all to train the neural net that siraj wrote. I ran mine on a gtx 1080 Ti and it took 26 seconds per epoch.

  • python version-3.6.1 ,Anaconda-4.4.0 code for this version of this tutorial——-
    https://github.com/ankitAMD/Ankit_Siraj5_Sentiment_Analysis/blob/master/siraj_tut-5_sentiment_analysis.ipynb

    https://github.com/ankitAMD/Ankit_Siraj5_Sentiment_Analysis

  • I subscribed because I like the content, but i think these videos are more for people who have an intermediate level understanding of python. I can follow the conceptual bits, but as soon as you start programming it with python, you explain your steps but only on a very high level.
    Could you recommend a place where I can learn programming ML by building on the fundamental concepts?

    btw nice rap ;p

  • Spoiler Alert:
    You have to introduce your personal data (including your credit card number and shit) in AWS in order to test this tutorial properly.

  • How do I save the trained model and reload it on another python program? Can you give an example on how to use the model, and the real world application?

  • the to_categorical line throws an error saying
    shape mismatch: indexing arrays could not be broadcast together with shapes (22500,) (22500,2)

  • Getting an error not sure what I've done wrong:

    Traceback (most recent call last):
    File "sentiment.py", line 6, in <module>
    valid_portion=0.1)
    ValueError: too many values to unpack (expected 2)

    Do you have the code for this demo hosted anywhere?

  • Superb video for getting hands dirty on DL based text classification. For those interested in looking under the hood, this blog on "Breathrough approaches for Sentiment Analysis" would be a good read.

    http://blog.paralleldots.com/technology/nlp/breakthrough-research-papers-and-models-for-sentiment-analysis/

  • Why you do not finished the example? Seeing on screen only Training Step: xxx | total loss: xxx is trivial. You should add prediction and also conversion of numbers from pkl into words/

  • If you used less time on memes and cringe rap lyrics, perhaps you could spend more time explaining this complex topic a little bit better.

  • Siraj Raval
    Good, really good and clean information sharing. And your additional show is something else. Good Job.

  • Hey Siraj, can you help me know "how the Emoticons are stored and retrieved for Sentiment Analysis?" Reply ASAP!

  • where did you define the architecture of the network here? I see that you did it in network building but I'm trying to grasp what the final architecture looks like.

  • Hi Siraj Raval,

    I am planning to create a twitter suicidality detection system. Can I get an advice on what to do to have a high accuracy of prediction? Can I apply deep learning with small data? Can I apply deep learning eith human interaction or human in the loop?

  • The embedding layer makes that words that have a similar meaning also are represented by similar vectors, right?

  • Can you please tell me which software(s) did you install in order to run this program? I'm facing problem in installing tensorflow. Thank you. Please reply asap.

  • please make a video on how to use the ReLu function.. unless you already did one, can you leave me to that video or article?

  • love your approach but unfortunately not so good for beginners like my self, even if play the pause, replay or play in slower speed there are so many things i feel i need more explanation on. nevertheless I enjoy watching anyway 🙂

  • hey siraj! recently i have chosen sentiment analysis as our sophomore project but i have no knowledge of machine learning or anything related to it. could you suggest how i should get started? i want to analyze amazon reviews and classify them. thanks a ton!

  • I am using part of this video as a reference in my university report on automatic sarcasm detection. Thank you, Siraj, for this simple and concise video.

  • Great videos Siraj, brilliant. I'm looking for code to detect gender identification and age group of Tweets and text file contents. It's the study of author profiling. I've found https://pan.webis.de/clef18/pan18-web/author-profiling.html and Github projects like https://github.com/nschaetti/PAN18-Author-Profiling. Do you have a TF project on author profiling?

  • can i use the sentiment reviews set to train the model but then use it for normal messages (not reviews), to get the sentiment?

  • The substance on this channel is there. Its just Siraj's personality, humor choices, and some editing that is annoying. Oh well, at least he is making great videos and you can't beat the price.

Leave a Reply

Your email address will not be published. Required fields are marked *