Difference between revisions of "Orange: Wikipedia"

From OnnoWiki
Jump to navigation Jump to search
Line 4: Line 4:
 
Fetching data from MediaWiki RESTful web service API.
 
Fetching data from MediaWiki RESTful web service API.
  
Inputs
+
==Input==
  
    None
+
None
  
Outputs
+
==Output==
  
    Corpus: A collection of documents from the Wikipedia.
+
Corpus: A collection of documents from the Wikipedia.
  
 
Wikipedia widget is used to retrieve texts from Wikipedia API and it is useful mostly for teaching and demonstration.
 
Wikipedia widget is used to retrieve texts from Wikipedia API and it is useful mostly for teaching and demonstration.
Line 16: Line 16:
 
[[File:Wikipedia-stamped.png|center|200px|thumb]]
 
[[File:Wikipedia-stamped.png|center|200px|thumb]]
  
    Query parameters:
+
* Query parameters:
        Query word list, where each query is listed in a new line.
+
** Query word list, where each query is listed in a new line.
        Language of the query. English is set by default.
+
** Language of the query. English is set by default.
        Number of articles to retrieve per query (range 1-25). Please note that querying is done recursively and that disambiguations are also retrieved, sometimes resulting in a larger number of queries than set on the slider.
+
** Number of articles to retrieve per query (range 1-25). Please note that querying is done recursively and that disambiguations are also retrieved, sometimes resulting in a larger number of queries than set on the slider.
    Select which features to include as text features.
+
* Select which features to include as text features.
    Information on the output.
+
* Information on the output.
    Produce a report.
+
* Produce a report.
    Run query.
+
* Run query.
  
 
==Contoh==
 
==Contoh==

Revision as of 10:03, 29 January 2020

Sumber: https://orange3-text.readthedocs.io/en/latest/widgets/wikipedia-widget.html


Fetching data from MediaWiki RESTful web service API.

Input

None

Output

Corpus: A collection of documents from the Wikipedia.

Wikipedia widget is used to retrieve texts from Wikipedia API and it is useful mostly for teaching and demonstration.

Wikipedia-stamped.png
  • Query parameters:
    • Query word list, where each query is listed in a new line.
    • Language of the query. English is set by default.
    • Number of articles to retrieve per query (range 1-25). Please note that querying is done recursively and that disambiguations are also retrieved, sometimes resulting in a larger number of queries than set on the slider.
  • Select which features to include as text features.
  • Information on the output.
  • Produce a report.
  • Run query.

Contoh

This is a simple example, where we use Wikipedia and retrieve the articles on ‘Slovenia’ and ‘Germany’. Then we simply apply default preprocessing with Preprocess Text and observe the most frequent words in those articles with Word Cloud.

Wikipedia-Example.png

Wikipedia works just like any other corpus widget (NY Times, Twitter) and can be used accordingly.



Referensi

Pranala Menarik