Difference between revisions of "Orange: Loading Your Data"

From OnnoWiki
Jump to navigation Jump to search
Line 27: Line 27:
  
 
../_images/File.png
 
../_images/File.png
File Widget: Setting the Attribute Type and Role
+
 
 +
==File Widget: Setting the Attribute Type and Role==
  
 
The File widget sends the data to the Data Table. Double click the Data Table to see its contents:
 
The File widget sends the data to the Data Table. Double click the Data Table to see its contents:
Line 40: Line 41:
  
 
Change of attribute roles and types should be confirmed by clicking the Apply button.
 
Change of attribute roles and types should be confirmed by clicking the Apply button.
Select Columns: Setting the Attribute Role
+
 
 +
==Select Columns: Setting the Attribute Role==
  
 
Another way to set the data role is to feed the data to the Select Columns widget:
 
Another way to set the data role is to feed the data to the Select Columns widget:
Line 65: Line 67:
  
 
../_images/data-table-regression1.png
 
../_images/data-table-regression1.png
Header with Attribute Type Information
+
 
 +
==Header with Attribute Type Information==
  
 
Consider again the sample.xlsx dataset. This time we will augment the names of the attributes with prefixes that define attribute type (continuous, discrete, time, string) and role (class or meta attribute). Prefixes are separated from the attribute name with a hash sign (“#”). Prefixes for attribute roles are:
 
Consider again the sample.xlsx dataset. This time we will augment the names of the attributes with prefixes that define attribute type (continuous, discrete, time, string) and role (class or meta attribute). Prefixes are separated from the attribute name with a hash sign (“#”). Prefixes for attribute roles are:
  
    c: class attribute
+
c: class attribute
 
+
m: meta attribute
    m: meta attribute
+
i: ignore the attribute
 
+
w: instance weights
    i: ignore the attribute
 
 
 
    w: instance weights
 
  
 
and for the type:
 
and for the type:
  
    C: Continuous
+
C: Continuous
 
+
D: Discrete
    D: Discrete
+
T: Time
 
+
S: String
    T: Time
 
 
 
    S: String
 
  
 
This is how the header with augmented attribute names looks like in Excel (sample-head.xlsx):
 
This is how the header with augmented attribute names looks like in Excel (sample-head.xlsx):
Line 96: Line 93:
  
 
Notice that the attributes we have ignored (label “i” in the attribute name) are not present in the dataset.
 
Notice that the attributes we have ignored (label “i” in the attribute name) are not present in the dataset.
Three-Row Header Format
+
 
 +
==Three-Row Header Format==
  
 
Orange’s legacy native data format is a tab-delimited text file with three header rows. The first row lists the attribute names, the second row defines their type (continuous, discrete, time and string, or abbreviated c, d, t, and s), and the third row an optional role (class, meta, weight, or ignore). Here is an example:
 
Orange’s legacy native data format is a tab-delimited text file with three header rows. The first row lists the attribute names, the second row defines their type (continuous, discrete, time and string, or abbreviated c, d, t, and s), and the third row an optional role (class, meta, weight, or ignore). Here is an example:
  
 
../_images/excel-with-tab1.png
 
../_images/excel-with-tab1.png
Data from Google Sheets
+
 
 +
==Data from Google Sheets==
  
 
Orange can read data from Google Sheets, as long as it conforms to the data presentation rules we have presented above. In Google Sheets, copy the shareable link (Share button, then Get shareable link) and paste it in the Data File / URL box of the File widget. For a taste, here’s one such link you can use: http://bit.ly/1J12Tdp, and the way we have entered it in the File widget:
 
Orange can read data from Google Sheets, as long as it conforms to the data presentation rules we have presented above. In Google Sheets, copy the shareable link (Share button, then Get shareable link) and paste it in the Data File / URL box of the File widget. For a taste, here’s one such link you can use: http://bit.ly/1J12Tdp, and the way we have entered it in the File widget:
  
 
../_images/File-Google-Sheet.png
 
../_images/File-Google-Sheet.png
Data from LibreOffice
+
 
 +
==Data from LibreOffice==
  
 
If you are using LibreOffice, simply save your files in Excel (.xlsx) format (available from the drop-down menu under Save As Type).
 
If you are using LibreOffice, simply save your files in Excel (.xlsx) format (available from the drop-down menu under Save As Type).
  
 
../_images/saving-tab-delimited-files.png
 
../_images/saving-tab-delimited-files.png
Datetime Format
+
 
 +
==Datetime Format==
  
 
To avoid ambiguity, Orange supports date and/or time formatted in one of the ISO 8601 formats. For example, the following values are all valid:
 
To avoid ambiguity, Orange supports date and/or time formatted in one of the ISO 8601 formats. For example, the following values are all valid:
  
2016
+
2016
2016-12-27
+
2016-12-27
2016-12-27 14:20:51
+
2016-12-27 14:20:51
16:20
+
16:20
  
 
==Referensi==
 
==Referensi==

Revision as of 06:20, 9 January 2020

Sumber: https://docs.biolab.si//3/visual-programming/loading-your-data/index.html


Orange menggunakan format datanya sendiri, tetapi juga dapat menangani Excel, comma- atau tab-delimited data file. Kumpulan data input biasanya berupa tabel, dengan instance data (sampel) dalam baris dan atribut data dalam kolom. Atribut dapat terdiri dari berbagai jenis (numeric, categorical, datetime, dan text) dan telah menetapkan peran (fitur input, atribut meta, dan class). Jenis dan peran atribut data dapat disediakan di header tabel data. Mereka juga dapat diubah di widget File, sementara peran data juga dapat dimodifikasi dengan widget Select Columns.

Singkat Kata

  • Orange dapat mengimpor file data comma- atau tab-delimited, atau file asli Excel atau dokumen Google Sheets. Gunakan File widget untuk memuat data dan, jika perlu, tentukan atribut class dan meta.
  • Type dan peran dapat diatur di widget File.
  • Nama atribut di header kolom dapat didahului dengan label diikuti oleh hash. Gunakan c untuk kelas dan m untuk atribut meta, i untuk mengabaikan kolom, w untuk weight (bobot) dari kolom, dan C, D, T, S untuk tipe atribut continuous, discrete, time, and string. Contoh: C#mph, mS#name, i#dummy.
  • Alternatif untuk notasi hash adalah format asli Orange dengan tiga baris header: yang pertama dengan nama atribut, yang kedua menentukan jenis (continuous, discrete, time, atau string), dan informasi pembuktian ketiga pada peran atribut (class, meta, weight atau ignore).

Data from Excel

Berikut adalah contoh dataset (sample.xlsx) yang menggunakan Excel:

Spreadsheet1.png

File berisi sebuah baris header, delapan instance data (baris) dan tujuh atribut data (kolom). Sel kosong dalam tabel menunjukkan entri data yang hilang. Baris mewakili gen; fungsi (class) disediakan di kolom pertama dan nama mereka di kolom kedua. Kolom yang tersisa menyimpan pengukuran yang mencirikan setiap gen. Dengan data ini, kita dapat, katakanlah, mengembangkan classifier yang akan memprediksi fungsi gen dari pengukuran karakteristiknya.

Mari kita mulai dengan workflow sederhana yang membaca data dan menampilkannya dalam tabel:

File-data-table-workflow.png

To load the data, open the File widget (double click on the icon of the widget), click on the file browser icon (“…”) and locate the downloaded file (called sample.xlsx) on your disk:

../_images/File.png

File Widget: Setting the Attribute Type and Role

The File widget sends the data to the Data Table. Double click the Data Table to see its contents:

../_images/table-widget.png

Orange correctly assumed that a column with gene names is meta information, which is displayed in the Data Table in columns shaded with light-brown. It has not guessed that function, the first non-meta column in our data file, is a class column. To correct this in Orange, we can adjust attribute role in the column display of File widget (below). Double-click the feature label in the function row and select target instead. This will set function attribute as our target (class) variable.

../_images/File-set-feature-kind.png

You can also change attribute type from nominal to numeric, from string to datetime, and so on. Naturally, data values have to suit the specified attribute type. Datetime accepts only values in ISO 8601 format, e.g. 2016-01-01 16:16:01. Orange would also assume the attribute is numeric if it has several different values, else it would be considered nominal. All other types are considered strings and are as such automatically categorized as meta attributes.

Change of attribute roles and types should be confirmed by clicking the Apply button.

Select Columns: Setting the Attribute Role

Another way to set the data role is to feed the data to the Select Columns widget:

../_images/select-columns-schema.png

Opening Select Columns reveals Orange’s classification of attributes. We would like all of our continuous attributes to be data features, gene function to be our target variable and gene names considered as meta attributes. We can obtain this by dragging the attribute names around the boxes in Select Columns:

../_images/select-columns-start.png

To correctly reassign attribute types, drag attribute named function to a Class box, and attribute named gene to a Meta Attribute box. The Select Columns widget should now look like this:

../_images/select-columns-reassigned.png

Change of attribute types in Select Columns widget should be confirmed by clicking the Apply button. The data from this widget is fed into Data Table that now renders the data just the way we intended:

../_images/data-table-with-class1.png

We could also define the domain for this dataset in a different way. Say, we could make the dataset ready for regression, and use heat 0 as a continuous class variable, keep gene function and name as meta variables, and remove heat 10 and heat 20 from the dataset:

../_images/select-columns-regression.png

By setting the attributes as above, the rendering of the data in the Data Table widget gives the following output:

../_images/data-table-regression1.png

Header with Attribute Type Information

Consider again the sample.xlsx dataset. This time we will augment the names of the attributes with prefixes that define attribute type (continuous, discrete, time, string) and role (class or meta attribute). Prefixes are separated from the attribute name with a hash sign (“#”). Prefixes for attribute roles are:

c: class attribute
m: meta attribute
i: ignore the attribute
w: instance weights

and for the type:

C: Continuous
D: Discrete
T: Time
S: String

This is how the header with augmented attribute names looks like in Excel (sample-head.xlsx):

../_images/spreadsheet-simple-head1.png

We can again use a File widget to load this dataset and then render it in the Data Table:

../_images/select-cols-simplified-header.png

Notice that the attributes we have ignored (label “i” in the attribute name) are not present in the dataset.

Three-Row Header Format

Orange’s legacy native data format is a tab-delimited text file with three header rows. The first row lists the attribute names, the second row defines their type (continuous, discrete, time and string, or abbreviated c, d, t, and s), and the third row an optional role (class, meta, weight, or ignore). Here is an example:

../_images/excel-with-tab1.png

Data from Google Sheets

Orange can read data from Google Sheets, as long as it conforms to the data presentation rules we have presented above. In Google Sheets, copy the shareable link (Share button, then Get shareable link) and paste it in the Data File / URL box of the File widget. For a taste, here’s one such link you can use: http://bit.ly/1J12Tdp, and the way we have entered it in the File widget:

../_images/File-Google-Sheet.png

Data from LibreOffice

If you are using LibreOffice, simply save your files in Excel (.xlsx) format (available from the drop-down menu under Save As Type).

../_images/saving-tab-delimited-files.png

Datetime Format

To avoid ambiguity, Orange supports date and/or time formatted in one of the ISO 8601 formats. For example, the following values are all valid:

2016
2016-12-27
2016-12-27 14:20:51
16:20

Referensi

Pranala Menarik