Usage¶
To use LACE in a project:
import lace # or
from lace import cliff, morph, lace1, add_to_bin, lace2_simulator
The CLIFF func:
cliff(attribute_names, data_matrix, independent_attrs, objective_attr, objective_as_binary=False, cliff_percentage=0.4)
param attribute_names: | |
---|---|
the attribute names. This should match the data_matrix | |
param data_matrix: | |
the data to trim | |
param independent_attrs: | |
set up the independent attributes in the dataset. Note: ‘name’, ‘id’, etc. might not be considered as independent attributes | |
param objective_attr: | |
marking which attribute is the objective to be considered | |
param objective_as_binary: | |
signal to set up whether treat the objective as a binary attribute. Default: False | |
param cliff_percentage: | |
set up how many records to be remained. By default, it is 0.4 | |
return: | the survived (valued) records |
The MORPH func:
morph(attribute_names, data_matrix, independent_attrs, objective_attr, objective_as_binary=False, data_has_normalized=False, alpha=0.15, beta=0.35)
param attribute_names: | |
---|---|
the names of attributes, should match the data_matrix | |
param data_matrix: | |
original data | |
param independent_attrs: | |
set up the independent attributes in the dataset. Note: ‘name’, ‘id’, etc. might not be considered as independent attributes | |
param objective_attr: | |
marking which attribute is the objective to be considered | |
param objective_as_binary: | |
signal to set up whether treat the objective as a binary attribute. Default: False | |
param data_has_normalized: | |
telling whether the data matrix has been normalized. | |
param alpha: | morph algorithm parameter |
param beta: | morph algorithm parameter |
return: | handled records |
The most convenient way to use LACE1 is:
lace1(attribute_names, data_matrix, independent_attrs, objective_attr, objective_as_binary=False, cliff_percentage=0.4, alpha=0.15, beta=0.35)
param attribute_names: | |
---|---|
the names of attributes, should match the data_matrix | |
param data_matrix: | |
original data | |
param independent_attrs: | |
set up the independent attributes in the dataset. Note: ‘name’, ‘id’, etc. might not be considered as independent attributes | |
param objective_attr: | |
marking which attribute is the objective to be considered | |
param objective_as_binary: | |
signal to set up whether treat the objective as a binary attribute. Default: False | |
param cliff_percentage: | |
prune rate | |
param alpha: | parameter 1 in morph, defining the shaking degree |
param beta: | parameter 2 in morph, defining the shaking degree |
The data selection and processor in LACE2:
add_to_bin(attribute_names, try2add_data_matrix, independent_attrs, objective_attr, objective_as_binary=False, cliff_percentage=0.4, morph_alpha=0.15, morph_beta=0.35, passing_bin=None)
param attribute_names: | |
---|---|
the names of attributes, should match the data_matrix | |
param try2add_data_matrix: | |
the data anyone is holding | |
param independent_attrs: | |
set up the independent attributes in the dataset. Note: ‘name’, ‘id’, etc. might not be considered as independent attributes | |
param objective_attr: | |
marking which attribute is the objective to be considered | |
param objective_as_binary: | |
signal to set up whether treat the objective as a binary attribute. Default: False | |
param cliff_percentage: | |
prune rate | |
param morph_alpha: | |
parameter 1 in morph, defining the shaking degree | |
param morph_beta: | |
parameter 2 in morph, defining the shaking degree | |
param passing_bin: | |
the data get from someone else. Set None if no passing data | |
return: | the new passing_bin. NOTE: the result must be assigned to another variable. The parameter pointer will NOT be changed |
LACE also provides a simple LACE2 application simulator. It automatically distribute all data to different members UNEQUALLY.:
lace2_simulator(attribute_names, data_matrix, independent_attrs, objective_attr, objective_as_binary=False, cliff_percentage=0.4, morph_alpha=0.15, morph_beta=0.35, number_of_holder=5)
Here we have a complete simple example to propess the data. This data is from Data.Gov
import lace
import csv
with open('example.csv', 'r') as f:
reader = csv.reader(f)
header = next(reader)
data = list()
for line in reader:
data.append(line)
attribute_names = header
data_matrix = data
independent_attrs = ['ADM_RATE', 'SAT_AVG', 'TUITFTE', 'RET_FT4', 'PCTFLOAN', 'PCTPELL', 'DEBT_MDN', 'C150_4', 'CDR3']
objective_attr = 'mn_earn_wne_p7'
aftercliff = lace.cliff(attribute_names, data_matrix, independent_attrs, objective_attr, False, 0.4)
assert len(aftercliff) < 600
aftermorph = lace.morph(attribute_names, aftercliff, independent_attrs, objective_attr, False, False, 0.15, 0.35)
assert len(aftermorph)==len(aftercliff) and aftermorph[0] != aftercliff[0]
lace1res = lace.lace1(attribute_names, data_matrix, independent_attrs, objective_attr, False, 0.4, 0.15,0.35)
assert len(lace1res) < len(data)*0.5
bins = [header] + data[:50]
try2add_data_matrix = data[200:700]
bins = lace.add_to_bin(attribute_names, try2add_data_matrix, independent_attrs, objective_attr, False, 0.4, 0.15, 0.35, bins)
assert len(bins) < 550
lace2res = lace.lace2_simulator(attribute_names, data_matrix, independent_attrs, objective_attr, False, 0.4, 0.15, 0.35, number_of_holder=5)
assert len(lace2res)<len(lace1res)