Difference between revisions of "Orange: Twitter"

From OnnoWiki
Jump to navigation Jump to search
 
(6 intermediate revisions by the same user not shown)
Line 2: Line 2:
  
  
Fetching data from The Twitter Search API.
+
Widget Twitter mengambil data menggunakan API Twitter Search.
  
Inputs
+
==Input==
  
    None
+
None
  
Outputs
+
==Output==
  
    Corpus: A collection of tweets from the Twitter API.
+
Corpus: A collection of tweets from the Twitter API.
  
Twitter widget enables querying tweets through Twitter API. You can query by content, author or both and accumulate results should you wish to create a larger data set. The widget only supports REST API and allows queries for up to two weeks back.
+
Widget Twitter memungkinkan kita untuk meng-query tweet melalui API Twitter. Kita dapat meng-query berdasarkan content, author atau ke dua-nya dan mengakumulasi hasil-nya jika kita ingin membuat dataset yang besar. Widget Twitter hanya mendukung REST API dan hanya mengijinkan untuk query sampai dua minggu ke belakang.
  
[[File:Twitter-stamped.png|center|200px|thumb]]
+
[[File:Twitter-stamped.png|center|300px|thumb]]
  
    To begin your queries, insert Twitter key and secret. They are securely saved in your system keyring service (like Credential Vault, Keychain, KWallet, etc.) and won’t be deleted when clearing widget settings. You must first create a Twitter app to get API keys.
+
* To begin your queries, insert Twitter key and secret. They are securely saved in your system keyring service (like Credential Vault, Keychain, KWallet, etc.) and won’t be deleted when clearing widget settings. You must first create a Twitter app to get API keys.
  
 
[[File:Twitter-key.png|center|200px|thumb]]
 
[[File:Twitter-key.png|center|200px|thumb]]
  
    Set query parameters:
+
* Set query parameters:
        Query word list: list desired queries, one per line. Queries are automatically joined by OR.
+
** Query word list: list desired queries, one per line. Queries are automatically joined by OR.
        Search by: specify whether you want to search by content, author or both. If searching by author, you must enter proper Twitter handle (without @) in the query list.
+
** Search by: specify whether you want to search by content, author or both. If searching by author, you must enter proper Twitter handle (without @) in the query list.
        Language: set the language of retrieved tweets. Any will retrieve tweets in any language.
+
** Language: set the language of retrieved tweets. Any will retrieve tweets in any language.
        Max tweets: set the top limit of retrieved tweets. If box is not ticked, no upper bound will be set - widget will retrieve all available tweets.
+
** Max tweets: set the top limit of retrieved tweets. If box is not ticked, no upper bound will be set - widget will retrieve all available tweets.
        Allow retweets: if ‘Allow retweets’ is checked, retweeted tweets will also appear on the output. This might duplicate some results.
+
** Allow retweets: if ‘Allow retweets’ is checked, retweeted tweets will also appear on the output. This might duplicate some results.
        Collect results: if ‘Collect results’ is ticked, widget will append new queries to the previous ones. Enter new queries, run Search and new results will be appended to the previous ones.
+
** Collect results: if ‘Collect results’ is ticked, widget will append new queries to the previous ones. Enter new queries, run Search and new results will be appended to the previous ones.
    Define which features to include as text features.
+
* Define which features to include as text features.
    Information on the number of tweets on the output.
+
* Information on the number of tweets on the output.
    Run query.
+
* Run query.
  
 
==Contoh==
 
==Contoh==
  
First, let’s try a simple query. We will search for tweets containing either ‘data mining’ or ‘machine learning’ in the content and allow retweets. We will further limit our search to only a 100 tweets in English.
+
Menggunakan widget Twitter, kita akan mencoba simple query. Kita akan mencari tweet yang berisi ‘data mining’ atau ‘machine learning’ dalam contant dan juga di retweet. Kita akan di batasi search hanya 100 tweet dalam bahasa Inggris.
  
[[File:Twitter-Example1.png|center|200px|thumb]]
+
Pertama-tama, kita cek output menggunakan widget Corpus Viewer untuk memperoleh gambaran tentang hasil yang kita peroleh. Kemudian, menggunakan widget Preprocess Text di process tweet agar menjadi lowercase, url removal, tweet tokenizer dan membuang stopword dan punctuation. Cara terbaik untuk melihat hasilnya adalah menggunakan widget  Word Cloud. Widget Word Cloud akan menampilkan kata yang paling populer di bidang data mining dan machine learning dalam dua minggu terakhir.
  
First, we’re checking the output in Corpus Viewer to get the initial idea about our results. Then we’re preprocessing the tweets with lowercase, url removal, tweet tokenizer and removal of stopword and punctuation. The best way to see the results is with Word Cloud. This will display the most popular words in field of data mining and machine learning in the past two weeks.
+
[[File:Twitter-Example1.png|center|600px|thumb]]
  
Our next example is a bit more complex. We’re querying tweets from Hillary Clinton and Donald Trump from the presidential campaign 2016.
 
  
[[File:Twitter-Example2.png|center|200px|thumb]]
+
Menggunakan widget Twitter, kita query tweet dari Hillary Clinton dan Donald Trump saat presidential campaign 2016. Kemudian, gunakan widget Preprocess Text untuk memperoleh token kata yang benar di output. Sambungkan widget Preprocess Text ke widget Bag of Words untuk membuat sebuah tabel dengan kata sebagai feature dan jumlah kata sebagai nilai-nya. Cek sepintas di widget Word Cloud memberikan gambaran tentang hasil-nya.
  
Then we’ve used Preprocess Text to get suitable tokens on our output. We’ve connected Preprocess Text to Bag of Words in order to create a table with words as features and their counts as values. A quick check in Word Cloud gives us an idea about the results.
+
Selanjutnya, kita ingin mem-predict author dari tweet. Menggunakan Widget Select Columns, kita men-set ‘Author’ sebagai target varibel. Kemudian, kita sambungkan widget Select Columns ke widget Test & Score. Kita gunakan widget Logistic Regression sebagai learner, yang juga kita sambungkan ke widget Test & Score.
  
Now we would like to predict the author of the tweet. With Select Columns we’re setting ‘Author’ as our target variable. Then we connect Select Columns to Test & Score. We’ll be using Logistic Regression as our learner, which we also connect to Test & Score.
+
Kita akan melihat hasil dari prediksi author secara langsung di widget Test & Score. Score AUC cukup baik. Sepertinya kita dapat sedikit banyak memprediksi siapa penulis tweet berdasarkan konten tweet.
 
 
We can observe the results of our author predictions directly in the widget. AUC score is quite ok. Seems like we can to some extent predict who is the author of the tweet based on the tweet content.
 
  
 +
[[File:Twitter-Example2.png|center|600px|thumb]]
  
 +
==Youtube==
  
 +
* [https://www.youtube.com/watch?v=gIW_OzSx4_M YOUTUBE: Membuat API The Guardian, NY Times, Twitter]
  
  

Latest revision as of 04:05, 12 April 2020

Sumber: https://orange3-text.readthedocs.io/en/latest/widgets/twitter-widget.html


Widget Twitter mengambil data menggunakan API Twitter Search.

Input

None

Output

Corpus: A collection of tweets from the Twitter API.

Widget Twitter memungkinkan kita untuk meng-query tweet melalui API Twitter. Kita dapat meng-query berdasarkan content, author atau ke dua-nya dan mengakumulasi hasil-nya jika kita ingin membuat dataset yang besar. Widget Twitter hanya mendukung REST API dan hanya mengijinkan untuk query sampai dua minggu ke belakang.

Twitter-stamped.png
  • To begin your queries, insert Twitter key and secret. They are securely saved in your system keyring service (like Credential Vault, Keychain, KWallet, etc.) and won’t be deleted when clearing widget settings. You must first create a Twitter app to get API keys.
Twitter-key.png
  • Set query parameters:
    • Query word list: list desired queries, one per line. Queries are automatically joined by OR.
    • Search by: specify whether you want to search by content, author or both. If searching by author, you must enter proper Twitter handle (without @) in the query list.
    • Language: set the language of retrieved tweets. Any will retrieve tweets in any language.
    • Max tweets: set the top limit of retrieved tweets. If box is not ticked, no upper bound will be set - widget will retrieve all available tweets.
    • Allow retweets: if ‘Allow retweets’ is checked, retweeted tweets will also appear on the output. This might duplicate some results.
    • Collect results: if ‘Collect results’ is ticked, widget will append new queries to the previous ones. Enter new queries, run Search and new results will be appended to the previous ones.
  • Define which features to include as text features.
  • Information on the number of tweets on the output.
  • Run query.

Contoh

Menggunakan widget Twitter, kita akan mencoba simple query. Kita akan mencari tweet yang berisi ‘data mining’ atau ‘machine learning’ dalam contant dan juga di retweet. Kita akan di batasi search hanya 100 tweet dalam bahasa Inggris.

Pertama-tama, kita cek output menggunakan widget Corpus Viewer untuk memperoleh gambaran tentang hasil yang kita peroleh. Kemudian, menggunakan widget Preprocess Text di process tweet agar menjadi lowercase, url removal, tweet tokenizer dan membuang stopword dan punctuation. Cara terbaik untuk melihat hasilnya adalah menggunakan widget Word Cloud. Widget Word Cloud akan menampilkan kata yang paling populer di bidang data mining dan machine learning dalam dua minggu terakhir.

Twitter-Example1.png


Menggunakan widget Twitter, kita query tweet dari Hillary Clinton dan Donald Trump saat presidential campaign 2016. Kemudian, gunakan widget Preprocess Text untuk memperoleh token kata yang benar di output. Sambungkan widget Preprocess Text ke widget Bag of Words untuk membuat sebuah tabel dengan kata sebagai feature dan jumlah kata sebagai nilai-nya. Cek sepintas di widget Word Cloud memberikan gambaran tentang hasil-nya.

Selanjutnya, kita ingin mem-predict author dari tweet. Menggunakan Widget Select Columns, kita men-set ‘Author’ sebagai target varibel. Kemudian, kita sambungkan widget Select Columns ke widget Test & Score. Kita gunakan widget Logistic Regression sebagai learner, yang juga kita sambungkan ke widget Test & Score.

Kita akan melihat hasil dari prediksi author secara langsung di widget Test & Score. Score AUC cukup baik. Sepertinya kita dapat sedikit banyak memprediksi siapa penulis tweet berdasarkan konten tweet.

Twitter-Example2.png

Youtube


Referensi

Pranala Menarik