Difference between revisions of "Orange: Import Documents"

From OnnoWiki
Jump to navigation Jump to search
(Created page with "Sumber: https://orange3-text.readthedocs.io/en/latest/widgets/importdocuments.html Import text documents from folders. Inputs None Outputs Corpus: A collecti...")
 
Line 1: Line 1:
 
Sumber: https://orange3-text.readthedocs.io/en/latest/widgets/importdocuments.html
 
Sumber: https://orange3-text.readthedocs.io/en/latest/widgets/importdocuments.html
 
 
 
 
  
  
Line 18: Line 14:
 
Import Documents widget retrieves text files from folders and creates a corpus. The widget reads .txt, .docx, .odt, .pdf and .xml files. If a folder contains subfolders, they will be used as class labels.
 
Import Documents widget retrieves text files from folders and creates a corpus. The widget reads .txt, .docx, .odt, .pdf and .xml files. If a folder contains subfolders, they will be used as class labels.
  
../_images/Import-Documents-stamped.png
+
[[File:Import-Documents-stamped.png|center|200px|thumb]]
  
 
     Folder being loaded.
 
     Folder being loaded.
Line 26: Line 22:
  
 
If the widget cannot read the file for some reason, the file will be skipped. Files that were successfully retrieved will still be on the output.
 
If the widget cannot read the file for some reason, the file will be skipped. Files that were successfully retrieved will still be on the output.
Example
+
 
 +
==Contoh==
  
 
To retrieve the data, select the folder icon on the right side of the widget. Select the folder you wish to turn into corpus. Once the loading is finished, you will see how many documents the widget retrieved. To inspect them, connect the widget to Corpus Viewer. We’ve used a set of Kennedy’s speeches in a plain text format.
 
To retrieve the data, select the folder icon on the right side of the widget. Select the folder you wish to turn into corpus. Once the loading is finished, you will see how many documents the widget retrieved. To inspect them, connect the widget to Corpus Viewer. We’ve used a set of Kennedy’s speeches in a plain text format.
  
../_images/Import-Documents-Example1.png
+
[[File:Import-Documents-Example1.png|center|200px|thumb]]
  
 
Now let us try it with subfolders. We have placed Kennedy’s speeches in two folders - pre-1962 and post-1962. If I load the parent folder, these two subfolders will be used as class labels. Check the output of the widget in a Data Table.
 
Now let us try it with subfolders. We have placed Kennedy’s speeches in two folders - pre-1962 and post-1962. If I load the parent folder, these two subfolders will be used as class labels. Check the output of the widget in a Data Table.
  
../_images/Import-Documents-Example2.png
+
[[File:Import-Documents-Example2.png|center|200px|thumb]]
  
  

Revision as of 10:14, 24 January 2020

Sumber: https://orange3-text.readthedocs.io/en/latest/widgets/importdocuments.html


Import text documents from folders.

Inputs

   None

Outputs

   Corpus: A collection of documents from the local machine.

Import Documents widget retrieves text files from folders and creates a corpus. The widget reads .txt, .docx, .odt, .pdf and .xml files. If a folder contains subfolders, they will be used as class labels.

Import-Documents-stamped.png
   Folder being loaded.
   Load folder from a local machine.
   Reload the data.
   Number of documents retrieved.

If the widget cannot read the file for some reason, the file will be skipped. Files that were successfully retrieved will still be on the output.

Contoh

To retrieve the data, select the folder icon on the right side of the widget. Select the folder you wish to turn into corpus. Once the loading is finished, you will see how many documents the widget retrieved. To inspect them, connect the widget to Corpus Viewer. We’ve used a set of Kennedy’s speeches in a plain text format.

Import-Documents-Example1.png

Now let us try it with subfolders. We have placed Kennedy’s speeches in two folders - pre-1962 and post-1962. If I load the parent folder, these two subfolders will be used as class labels. Check the output of the widget in a Data Table.

Import-Documents-Example2.png


Referensi

Pranala Menarik