NLTK: Sentiment Strength Detection in Bahasa Indonesia
Revision as of 10:36, 25 February 2017 by Onnowpurbo (talk | contribs)
SentiStrengthID
Sentiment Strength Detection in Bahasa Indonesia. This is unsupervised version of SentiStrength (http://sentistrength.wlv.ac.uk/) in Bahasa Indonesia. Core Feature:
- Sentiment Lookup
 - Negation Word Lookup
 - Booster Word Lookup
 - Emoticon Lookup
 - Idiom Lookup
 - Question Word Lookup
 - Slang Word Lookup
 - Spelling Correction (optional) using Pater Norvig (http://norvig.com/spell-correct.html)
 - Negative emotion ignored in question
 - Exclamation marks count as +2
 - Repeated Punctuation boosts sentiment
 
Ignored Rule:
repeated letters more than 2 boosts sentiment score. This rule do not applied due to my own pre-processing rule which removing word's extra character score +2, -2 in word "miss". Do not apply in Bahasa Indonesia.
Warning!
This is work in progress. Experimental for my Master Thesis
Ubah Source Code=
import argparse
def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument('-i', '--infile', default=, help='input filename')
    return parser.parse_args() 
def main():
    args = parse_args()
    infile = args.infile
    filename = open(infile,'r')
    fcontent=filename.read()
    filename.close()
    ss = sentiStrength()
    sc = spellCheck()
    for t in fcontent:
        print ss.main(t)
    print "====================="        
    print ss.getSentimenScore()
    
main()