Miscellaneous Functions
keywords: ALFA, miscellaneous functions
Most of the data handling, statistics, visualization, and machine learning commands in Alfarvis require reference variables, row and column labels, etc. for generating useful results. Here, we summarize different commands that could be used to handle these supporting variables.
OUTLINE
1. Ground truth / Reference
Supervised machine learning algorithms as well as a lot of statistical and visualization commands require a reference variable. For example, in tumor analysis, the reference variable could be whether the tumor is benign or malignant. The reference variables are mostly categorical variables as discussed in data filtering.
Here is an example of its usage. Please refer to the machine learning tutorials for more use cases.
load cancer data set
set ground truth to tumor diagnosis OR
set reference to rumor diagnosis OR
set gt to tumor diagnosis OR
This can be followed by machine learning:
train support vector machine on the cancer dataset
save model as svm1
load test data set
run svm1 on the test data set
2. Row Labels
Sometimes we require all the observations in our dataset to have row labels. These are especially useful when running statistics. For example, if I have a dataset of IMDB ratings of all the movies that were ever made and I want to identify the movie with the highest rating, I can use Alfarvis and row labels as follows:
load movie dataset
set row label to movie names
find the maximum imdb rating
This will return the value of the maximum imdb rating and also the corresponding movie name (as it is our row label).
NOTE: You can also write setrl
,set row
, etc. as long as it unambiguous for ALFA to resolve your intent.
The row labels and ground truth can be cleared using clear
command.
e.g.
clear row labels
clear reference