Orange: Merge Data

Sumber: https://docs.biolab.si//3/visual-programming/widgets/data/mergedata.html

Menggabungkan dua dataset, berdasarkan nilai atribut yang dipilih.

Input

Data: input dataset
Extra Data: additional dataset

Output

Data: dataset with features added from extra data

Widget Gabungkan Data digunakan untuk menggabungkan dua dataset secara horizontal, berdasarkan nilai atribut yang dipilih (kolom). Dalam input, diperlukan dua set data, data dan data ekstra. Baris dari dua set data dicocokkan dengan nilai pasangan atribut, yang dipilih oleh pengguna. Widget menghasilkan satu output. Ini sesuai dengan instance dari input data yang atribut (kolom) dari input data tambahan ditambahkan.

Jika pasangan atribut yang dipilih tidak mengandung nilai unik (dengan kata lain, atribut memiliki nilai duplikat), widget akan memberikan peringatan. Sebagai gantinya, seseorang dapat mencocokkan dengan lebih dari satu atribut. Klik pada ikon plus untuk menambahkan atribut untuk digabungkan. Hasil akhir harus merupakan kombinasi unik untuk setiap baris individual.

Information on main data.
Information on data to append.
Merging type:
- Append columns from Extra Data outputs all rows from the Data, augmented by the columns in the Extra Data. Rows without matches are retained, even where the data in the extra columns are missing.
- Find matching pairs of rows outputs rows from the Data, augmented by the columns in the Extra Data. Rows without matches are removed from the output.
- Concatenate tables treats both data sources symmetrically. The output is similar to the first option, except that non-matched values from Extra Data are appended at the end.
List of attributes from Data input.
List of attributes from Extra Data input.
Hasilkan Report.

Tipe Merging

Append Columns from Extra Data (left join)

Kolom dari Data Ekstra ditambahkan ke Data. Instance tanpa baris yang cocok akan memiliki nilai hilang yang ditambahkan.

Misalnya, tabel pertama mungkin berisi nama kota dan yang kedua adalah daftar kota dan koordinatnya. Kolom dengan koordinat kemudian akan ditambahkan ke data dengan nama kota. Jika nama kota tidak dapat dicocokkan, nilai yang hilang akan muncul.

Dalam contoh, input Data pertama berisi 6 kota, tetapi Data Ekstra tidak memberikan nilai Lat dan Lon untuk Bratislava, sehingga field tersebut akan kosong.

Find matching pairs of rows (inner join)

Hanya baris-baris yang cocok yang akan ada pada output, dengan kolom Data Tambahan ditambahkan. Baris yang tidak ada kecocokan akan dihapus.

Dalam contoh, Bratislava dari input Data tidak memiliki nilai Lat dan Lon, sedangkan Beograd dari Extra Data tidak dapat ditemukan di kolom Kota yang kita gabungkan. Karenanya kedua instance dihapus - hanya intersection instance yang dikirim ke output.

Concatenate tables (outer join)

The rows from both the Data and the Extra Data will be present on the output. Where rows cannot be matched, missing values will appear.

In our example, both Bratislava and Belgrade are now present. Bratislava will have missing Lat and Lon values, while Belgrade will have a missing Population value.

Row index

Data will be merged in the same order as they appear in the table. Row number 1 from the Data input will be joined with row number 1 from the Extra Data input. Row numbers are assigned by Orange based on the original order of the data instances.

Instance ID

This is a more complex option. Sometimes, data in transformed in the analysis and the domain is no longer the same. Nevertheless, the original row indices are still present in the background (Orange remembers them). In this case one can merge on instance ID. For example if you transformed the data with PCA, visualized it in the Scatter Plot, selected some data instances and now you wish to see the original information of the selected subset. Connect the output of Scatter Plot to Merge Data, add the original data set as Extra Data and merge by Instance ID.

Merge by two or more attributes

Sometimes our data instances are unique with respect to a combination of columns, not a single column. To merge by more than a single column, add the Row matching condition by pressing plus next to the matching condition. To remove it, press the x.

In the below example, we are merging by student column and class column.

Say we have two data sets with student names and the class they’re in. The first data set has students’ grades and the second on the elective course they have chosen. Unfortunately, there are two Jacks in our data, one from class A and the other from class B. Same for Jane.

To distinguish between the two, we can match rows on both, the student’s name and her class.

Contoh

Merging two datasets results in appending new attributes to the original file, based on a selected common attribute. In the example below, we wanted to merge the zoo.tab file containing only factual data with zoo-with-images.tab containing images. Both files share a common string attribute names. Now, we create a workflow connecting the two files. The zoo.tab data is connected to Data input of the Merge Data widget, and the zoo-with-images.tab data to the Extra Data input. Outputs of the Merge Data widget is then connected to the Data Table widget. In the latter, the Merged Data channels are shown, where image attributes are added to the original data.

The case where we want to include all instances in the output, even those where no match by attribute names was found, is shown in the following workflow.