python实现随机森林random forest的原理及方法


Posted in Python onDecember 21, 2017

引言

想通过随机森林来获取数据的主要特征

1、理论

随机森林是一个高度灵活的机器学习方法,拥有广泛的应用前景,从市场营销到医疗保健保险。 既可以用来做市场营销模拟的建模,统计客户来源,保留和流失。也可用来预测疾病的风险和病患者的易感性。

根据个体学习器的生成方式,目前的集成学习方法大致可分为两大类,即个体学习器之间存在强依赖关系,必须串行生成的序列化方法,以及个体学习器间不存在强依赖关系,可同时生成的并行化方法;

前者的代表是Boosting,后者的代表是Bagging和“随机森林”(Random
Forest)

随机森林在以决策树为基学习器构建Bagging集成的基础上,进一步在决策树的训练过程中引入了随机属性选择(即引入随机特征选择)。

简单来说,随机森林就是对决策树的集成,但有两点不同:

(2)特征选取的差异性:每个决策树的n个分类特征是在所有特征中随机选择的(n是一个需要我们自己调整的参数)
随机森林,简单理解, 比如预测salary,就是构建多个决策树job,age,house,然后根据要预测的量的各个特征(teacher,39,suburb)分别在对应决策树的目标值概率(salary<5000,salary>=5000),从而,确定预测量的发生概率(如,预测出P(salary<5000)=0.3).

随机森林是一个可做能够回归和分类。 它具备处理大数据的特性,而且它有助于估计或变量是非常重要的基础数据建模。

参数说明:

最主要的两个参数是n_estimators和max_features。

n_estimators:表示森林里树的个数。理论上是越大越好。但是伴随着就是计算时间的增长。但是并不是取得越大就会越好,预测效果最好的将会出现在合理的树个数。

max_features:随机选择特征集合的子集合,并用来分割节点。子集合的个数越少,方差就会减少的越快,但同时偏差就会增加的越快。根据较好的实践经验。如果是回归问题则:

max_features=n_features,如果是分类问题则max_features=sqrt(n_features)。

如果想获取较好的结果,必须将max_depth=None,同时min_sample_split=1。
同时还要记得进行cross_validated(交叉验证),除此之外记得在random forest中,bootstrap=True。但在extra-trees中,bootstrap=False。

2、随机森林python实现

2.1Demo1

实现随机森林基本功能

#随机森林
from sklearn.tree import DecisionTreeRegressor 
from sklearn.ensemble import RandomForestRegressor 
import numpy as np 

from sklearn.datasets import load_iris 
iris=load_iris() 
#print iris#iris的4个属性是:萼片宽度 萼片长度 花瓣宽度 花瓣长度 标签是花的种类:setosa versicolour virginica 
print(iris['target'].shape)
rf=RandomForestRegressor()#这里使用了默认的参数设置 
rf.fit(iris.data[:150],iris.target[:150])#进行模型的训练 

#随机挑选两个预测不相同的样本 
instance=iris.data[[100,109]] 
print(instance)
rf.predict(instance[[0]])
print('instance 0 prediction;',rf.predict(instance[[0]]))
print( 'instance 1 prediction;',rf.predict(instance[[1]]))
print(iris.target[100],iris.target[109])

运行结果

(150,)
[[ 6.3  3.3  6.   2.5]
 [ 7.2  3.6  6.1  2.5]]
instance 0 prediction; [ 2.]
instance 1 prediction; [ 2.]
2 2

2.2 Demo2

3种方法的比较

#random forest test
from sklearn.model_selection import cross_val_score
from sklearn.datasets import make_blobs
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.tree import DecisionTreeClassifier
X, y = make_blobs(n_samples=10000, n_features=10, centers=100,random_state=0)

clf = DecisionTreeClassifier(max_depth=None, min_samples_split=2,random_state=0)
scores = cross_val_score(clf, X, y)
print(scores.mean())    


clf = RandomForestClassifier(n_estimators=10, max_depth=None,min_samples_split=2, random_state=0)
scores = cross_val_score(clf, X, y)
print(scores.mean())    

clf = ExtraTreesClassifier(n_estimators=10, max_depth=None,min_samples_split=2, random_state=0)
scores = cross_val_score(clf, X, y)
print(scores.mean())

运行结果:

0.979408793821
0.999607843137
0.999898989899

2.3 Demo3-实现特征选择

#随机森林2
from sklearn.tree import DecisionTreeRegressor 
from sklearn.ensemble import RandomForestRegressor 
import numpy as np 
from sklearn.datasets import load_iris 
iris=load_iris() 

from sklearn.model_selection import cross_val_score, ShuffleSplit 
X = iris["data"] 
Y = iris["target"] 
names = iris["feature_names"] 
rf = RandomForestRegressor() 
scores = [] 
for i in range(X.shape[1]): 
 score = cross_val_score(rf, X[:, i:i+1], Y, scoring="r2", 
    cv=ShuffleSplit(len(X), 3, .3)) 
 scores.append((round(np.mean(score), 3), names[i])) 
print(sorted(scores, reverse=True))

运行结果:

[(0.89300000000000002, 'petal width (cm)'), (0.82099999999999995, 'petal length
(cm)'), (0.13, 'sepal length (cm)'), (-0.79100000000000004, 'sepal width (cm)')]

2.4 demo4-随机森林

本来想利用以下代码来构建随机随机森林决策树,但是,遇到的问题是,程序一直在运行,无法响应,还需要调试。

#随机森林4
#coding:utf-8 
import csv 
from random import seed 
from random import randrange 
from math import sqrt 
def loadCSV(filename):#加载数据,一行行的存入列表 
 dataSet = [] 
 with open(filename, 'r') as file: 
 csvReader = csv.reader(file) 
 for line in csvReader: 
  dataSet.append(line) 
 return dataSet 

# 除了标签列,其他列都转换为float类型 
def column_to_float(dataSet): 
 featLen = len(dataSet[0]) - 1 
 for data in dataSet: 
 for column in range(featLen): 
  data[column] = float(data[column].strip()) 

# 将数据集随机分成N块,方便交叉验证,其中一块是测试集,其他四块是训练集 
def spiltDataSet(dataSet, n_folds): 
 fold_size = int(len(dataSet) / n_folds) 
 dataSet_copy = list(dataSet) 
 dataSet_spilt = [] 
 for i in range(n_folds): 
 fold = [] 
 while len(fold) < fold_size: # 这里不能用if,if只是在第一次判断时起作用,while执行循环,直到条件不成立 
  index = randrange(len(dataSet_copy)) 
  fold.append(dataSet_copy.pop(index)) # pop() 函数用于移除列表中的一个元素(默认最后一个元素),并且返回该元素的值。 
 dataSet_spilt.append(fold) 
 return dataSet_spilt 

# 构造数据子集 
def get_subsample(dataSet, ratio): 
 subdataSet = [] 
 lenSubdata = round(len(dataSet) * ratio)#返回浮点数 
 while len(subdataSet) < lenSubdata: 
 index = randrange(len(dataSet) - 1) 
 subdataSet.append(dataSet[index]) 
 # print len(subdataSet) 
 return subdataSet 

# 分割数据集 
def data_spilt(dataSet, index, value): 
 left = [] 
 right = [] 
 for row in dataSet: 
 if row[index] < value: 
  left.append(row) 
 else: 
  right.append(row) 
 return left, right 

# 计算分割代价 
def spilt_loss(left, right, class_values): 
 loss = 0.0 
 for class_value in class_values: 
 left_size = len(left) 
 if left_size != 0: # 防止除数为零 
  prop = [row[-1] for row in left].count(class_value) / float(left_size) 
  loss += (prop * (1.0 - prop)) 
 right_size = len(right) 
 if right_size != 0: 
  prop = [row[-1] for row in right].count(class_value) / float(right_size) 
  loss += (prop * (1.0 - prop)) 
 return loss 

# 选取任意的n个特征,在这n个特征中,选取分割时的最优特征 
def get_best_spilt(dataSet, n_features): 
 features = [] 
 class_values = list(set(row[-1] for row in dataSet)) 
 b_index, b_value, b_loss, b_left, b_right = 999, 999, 999, None, None 
 while len(features) < n_features: 
 index = randrange(len(dataSet[0]) - 1) 
 if index not in features: 
  features.append(index) 
 # print 'features:',features 
 for index in features:#找到列的最适合做节点的索引,(损失最小) 
 for row in dataSet: 
  left, right = data_spilt(dataSet, index, row[index])#以它为节点的,左右分支 
  loss = spilt_loss(left, right, class_values) 
  if loss < b_loss:#寻找最小分割代价 
  b_index, b_value, b_loss, b_left, b_right = index, row[index], loss, left, right 
 # print b_loss 
 # print type(b_index) 
 return {'index': b_index, 'value': b_value, 'left': b_left, 'right': b_right} 

# 决定输出标签 
def decide_label(data): 
 output = [row[-1] for row in data] 
 return max(set(output), key=output.count) 

# 子分割,不断地构建叶节点的过程对对对 
def sub_spilt(root, n_features, max_depth, min_size, depth): 
 left = root['left'] 
 # print left 
 right = root['right'] 
 del (root['left']) 
 del (root['right']) 
 # print depth 
 if not left or not right: 
 root['left'] = root['right'] = decide_label(left + right) 
 # print 'testing' 
 return 
 if depth > max_depth: 
 root['left'] = decide_label(left) 
 root['right'] = decide_label(right) 
 return 
 if len(left) < min_size: 
 root['left'] = decide_label(left) 
 else: 
 root['left'] = get_best_spilt(left, n_features) 
 # print 'testing_left' 
 sub_spilt(root['left'], n_features, max_depth, min_size, depth + 1) 
 if len(right) < min_size: 
 root['right'] = decide_label(right) 
 else: 
 root['right'] = get_best_spilt(right, n_features) 
 # print 'testing_right' 
 sub_spilt(root['right'], n_features, max_depth, min_size, depth + 1) 

 # 构造决策树 
def build_tree(dataSet, n_features, max_depth, min_size): 
 root = get_best_spilt(dataSet, n_features) 
 sub_spilt(root, n_features, max_depth, min_size, 1) 
 return root 
# 预测测试集结果 
def predict(tree, row): 
 predictions = [] 
 if row[tree['index']] < tree['value']: 
 if isinstance(tree['left'], dict): 
  return predict(tree['left'], row) 
 else: 
  return tree['left'] 
 else: 
 if isinstance(tree['right'], dict): 
  return predict(tree['right'], row) 
 else: 
  return tree['right'] 
  # predictions=set(predictions) 
def bagging_predict(trees, row): 
 predictions = [predict(tree, row) for tree in trees] 
 return max(set(predictions), key=predictions.count) 
# 创建随机森林 
def random_forest(train, test, ratio, n_feature, max_depth, min_size, n_trees): 
 trees = [] 
 for i in range(n_trees): 
 train = get_subsample(train, ratio)#从切割的数据集中选取子集 
 tree = build_tree(train, n_features, max_depth, min_size) 
 # print 'tree %d: '%i,tree 
 trees.append(tree) 
 # predict_values = [predict(trees,row) for row in test] 
 predict_values = [bagging_predict(trees, row) for row in test] 
 return predict_values 
# 计算准确率 
def accuracy(predict_values, actual): 
 correct = 0 
 for i in range(len(actual)): 
 if actual[i] == predict_values[i]: 
  correct += 1 
 return correct / float(len(actual)) 
if __name__ == '__main__': 
 seed(1) 
 dataSet = loadCSV(r'G:\0研究生\tianchiCompetition\训练小样本2.csv') 
 column_to_float(dataSet) 
 n_folds = 5 
 max_depth = 15 
 min_size = 1 
 ratio = 1.0 
 # n_features=sqrt(len(dataSet)-1) 
 n_features = 15 
 n_trees = 10 
 folds = spiltDataSet(dataSet, n_folds)#先是切割数据集 
 scores = [] 
 for fold in folds: 
 train_set = folds[ 
   :] # 此处不能简单地用train_set=folds,这样用属于引用,那么当train_set的值改变的时候,folds的值也会改变,所以要用复制的形式。(L[:])能够复制序列,D.copy() 能够复制字典,list能够生成拷贝 list(L) 
 train_set.remove(fold)#选好训练集 
 # print len(folds) 
 train_set = sum(train_set, []) # 将多个fold列表组合成一个train_set列表 
 # print len(train_set) 
 test_set = [] 
 for row in fold: 
  row_copy = list(row) 
  row_copy[-1] = None 
  test_set.append(row_copy) 
  # for row in test_set: 
  # print row[-1] 
 actual = [row[-1] for row in fold] 
 predict_values = random_forest(train_set, test_set, ratio, n_features, max_depth, min_size, n_trees) 
 accur = accuracy(predict_values, actual) 
 scores.append(accur) 
 print ('Trees is %d' % n_trees) 
 print ('scores:%s' % scores) 
 print ('mean score:%s' % (sum(scores) / float(len(scores))))

2.5 随机森林分类sonic data

# CART on the Bank Note dataset
from random import seed
from random import randrange
from csv import reader

# Load a CSV file
def load_csv(filename):
 file = open(filename, "r")
 lines = reader(file)
 dataset = list(lines)
 return dataset

# Convert string column to float
def str_column_to_float(dataset, column):
 for row in dataset:
 row[column] = float(row[column].strip())

# Split a dataset into k folds
def cross_validation_split(dataset, n_folds):
 dataset_split = list()
 dataset_copy = list(dataset)
 fold_size = int(len(dataset) / n_folds)
 for i in range(n_folds):
 fold = list()
 while len(fold) < fold_size:
  index = randrange(len(dataset_copy))
  fold.append(dataset_copy.pop(index))
 dataset_split.append(fold)
 return dataset_split

# Calculate accuracy percentage
def accuracy_metric(actual, predicted):
 correct = 0
 for i in range(len(actual)):
 if actual[i] == predicted[i]:
  correct += 1
 return correct / float(len(actual)) * 100.0

# Evaluate an algorithm using a cross validation split
def evaluate_algorithm(dataset, algorithm, n_folds, *args):
 folds = cross_validation_split(dataset, n_folds)
 scores = list()
 for fold in folds:
 train_set = list(folds)
 train_set.remove(fold)
 train_set = sum(train_set, [])
 test_set = list()
 for row in fold:
  row_copy = list(row)
  test_set.append(row_copy)
  row_copy[-1] = None
 predicted = algorithm(train_set, test_set, *args)
 actual = [row[-1] for row in fold]
 accuracy = accuracy_metric(actual, predicted)
 scores.append(accuracy)
 return scores

# Split a data set based on an attribute and an attribute value
def test_split(index, value, dataset):
 left, right = list(), list()
 for row in dataset:
 if row[index] < value:
  left.append(row)
 else:
  right.append(row)
 return left, right

# Calculate the Gini index for a split dataset
def gini_index(groups, class_values):
 gini = 0.0
 for class_value in class_values:
 for group in groups:
  size = len(group)
  if size == 0:
  continue
  proportion = [row[-1] for row in group].count(class_value) / float(size)
  gini += (proportion * (1.0 - proportion))
 return gini

# Select the best split point for a dataset
def get_split(dataset):
 class_values = list(set(row[-1] for row in dataset))
 b_index, b_value, b_score, b_groups = 999, 999, 999, None
 for index in range(len(dataset[0])-1):
 for row in dataset:
  groups = test_split(index, row[index], dataset)
  gini = gini_index(groups, class_values)
  if gini < b_score:
  b_index, b_value, b_score, b_groups = index, row[index], gini, groups
 print ({'index':b_index, 'value':b_value})
 return {'index':b_index, 'value':b_value, 'groups':b_groups}

# Create a terminal node value
def to_terminal(group):
 outcomes = [row[-1] for row in group]
 return max(set(outcomes), key=outcomes.count)

# Create child splits for a node or make terminal
def split(node, max_depth, min_size, depth):
 left, right = node['groups']
 del(node['groups'])
 # check for a no split
 if not left or not right:
 node['left'] = node['right'] = to_terminal(left + right)
 return
 # check for max depth
 if depth >= max_depth:
 node['left'], node['right'] = to_terminal(left), to_terminal(right)
 return
 # process left child
 if len(left) <= min_size:
 node['left'] = to_terminal(left)
 else:
 node['left'] = get_split(left)
 split(node['left'], max_depth, min_size, depth+1)
 # process right child
 if len(right) <= min_size:
 node['right'] = to_terminal(right)
 else:
 node['right'] = get_split(right)
 split(node['right'], max_depth, min_size, depth+1)

# Build a decision tree
def build_tree(train, max_depth, min_size):
 root = get_split(train)
 split(root, max_depth, min_size, 1)
 return root

# Make a prediction with a decision tree
def predict(node, row):
 if row[node['index']] < node['value']:
 if isinstance(node['left'], dict):
  return predict(node['left'], row)
 else:
  return node['left']
 else:
 if isinstance(node['right'], dict):
  return predict(node['right'], row)
 else:
  return node['right']

# Classification and Regression Tree Algorithm
def decision_tree(train, test, max_depth, min_size):
 tree = build_tree(train, max_depth, min_size)
 predictions = list()
 for row in test:
 prediction = predict(tree, row)
 predictions.append(prediction)
 return(predictions)

# Test CART on Bank Note dataset
seed(1)
# load and prepare data
filename = r'G:\0pythonstudy\决策树\sonar.all-data.csv'
dataset = load_csv(filename)
# convert string attributes to integers
for i in range(len(dataset[0])-1):
 str_column_to_float(dataset, i)
# evaluate algorithm
n_folds = 5
max_depth = 5
min_size = 10
scores = evaluate_algorithm(dataset, decision_tree, n_folds, max_depth, min_size)
print('Scores: %s' % scores)
print('Mean Accuracy: %.3f%%' % (sum(scores)/float(len(scores))))

运行结果:

{'index': 38, 'value': 0.0894}
{'index': 36, 'value': 0.8459}
{'index': 50, 'value': 0.0024}
{'index': 15, 'value': 0.0906}
{'index': 16, 'value': 0.9819}
{'index': 10, 'value': 0.0785}
{'index': 16, 'value': 0.0886}
{'index': 38, 'value': 0.0621}
{'index': 5, 'value': 0.0226}
{'index': 8, 'value': 0.0368}
{'index': 11, 'value': 0.0754}
{'index': 0, 'value': 0.0239}
{'index': 8, 'value': 0.0368}
{'index': 29, 'value': 0.1671}
{'index': 46, 'value': 0.0237}
{'index': 38, 'value': 0.0621}
{'index': 14, 'value': 0.0668}
{'index': 4, 'value': 0.0167}
{'index': 37, 'value': 0.0836}
{'index': 12, 'value': 0.0616}
{'index': 7, 'value': 0.0333}
{'index': 33, 'value': 0.8741}
{'index': 16, 'value': 0.0886}
{'index': 8, 'value': 0.0368}
{'index': 33, 'value': 0.0798}
{'index': 44, 'value': 0.0298}
Scores: [48.78048780487805, 70.73170731707317, 58.536585365853654, 51.2195121951
2195, 39.02439024390244]
Mean Accuracy: 53.659%
请按任意键继续. . .

知识点:

1.load CSV file

from csv import reader
# Load a CSV file
def load_csv(filename):
 file = open(filename, "r")
 lines = reader(file)
 dataset = list(lines)
 return dataset
filename = r'G:\0pythonstudy\决策树\sonar.all-data.csv'
dataset=load_csv(filename)
print(dataset)

2.把数据转化成float格式

# Convert string column to float
def str_column_to_float(dataset, column):
 for row in dataset:
 row[column] = float(row[column].strip())
 # print(row[column])
# convert string attributes to integers
for i in range(len(dataset[0])-1):
 str_column_to_float(dataset, i)

3.把最后一列的分类字符串转化成0、1整数

def str_column_to_int(dataset, column):
 class_values = [row[column] for row in dataset]#生成一个class label的list
 # print(class_values)
 unique = set(class_values)#set 获得list的不同元素
 print(unique)
 lookup = dict()#定义一个字典
 # print(enumerate(unique))
 for i, value in enumerate(unique):
 lookup[value] = i
 # print(lookup)
 for row in dataset:
 row[column] = lookup[row[column]]
 print(lookup['M'])

4、把数据集分割成K份

# Split a dataset into k folds
def cross_validation_split(dataset, n_folds):
 dataset_split = list()#生成空列表
 dataset_copy = list(dataset)
 print(len(dataset_copy))
 print(len(dataset))
 #print(dataset_copy)
 fold_size = int(len(dataset) / n_folds)
 for i in range(n_folds):
 fold = list()
 while len(fold) < fold_size:
  index = randrange(len(dataset_copy))
  # print(index)
  fold.append(dataset_copy.pop(index))#使用.pop()把里边的元素都删除(相当于转移),这k份元素各不相同。
 dataset_split.append(fold)
 return dataset_split
n_folds=5 
folds = cross_validation_split(dataset, n_folds)#k份元素各不相同的训练集

5.计算正确率

# Calculate accuracy percentage
def accuracy_metric(actual, predicted):
 correct = 0
 for i in range(len(actual)):
 if actual[i] == predicted[i]:
  correct += 1
 return correct / float(len(actual)) * 100.0#这个是二值分类正确性的表达式

6.二分类每列

# Split a data set based on an attribute and an attribute value
def test_split(index, value, dataset):
 left, right = list(), list()#初始化两个空列表
 for row in dataset:
 if row[index] < value:
  left.append(row)
 else:
  right.append(row)
 return left, right #返回两个列表,每个列表以value为界限对指定行(index)进行二分类。

7.使用gini系数来获得最佳分割点

# Calculate the Gini index for a split dataset
def gini_index(groups, class_values):
 gini = 0.0
 for class_value in class_values:
 for group in groups:
  size = len(group)
  if size == 0:
  continue
  proportion = [row[-1] for row in group].count(class_value) / float(size)
  gini += (proportion * (1.0 - proportion))
 return gini

# Select the best split point for a dataset
def get_split(dataset):
 class_values = list(set(row[-1] for row in dataset))
 b_index, b_value, b_score, b_groups = 999, 999, 999, None
 for index in range(len(dataset[0])-1):
 for row in dataset:
  groups = test_split(index, row[index], dataset)
  gini = gini_index(groups, class_values)
  if gini < b_score:
  b_index, b_value, b_score, b_groups = index, row[index], gini, groups
 # print(groups)
 print ({'index':b_index, 'value':b_value,'score':gini})
 return {'index':b_index, 'value':b_value, 'groups':b_groups}

这段代码,在求gini指数,直接应用定义式,不难理解。获得最佳分割点可能比较难看懂,这里用了两层迭代,一层是对不同列的迭代,一层是对不同行的迭代。并且,每次迭代,都对gini系数进行更新。

8、决策树生成

# Create child splits for a node or make terminal
def split(node, max_depth, min_size, depth):
 left, right = node['groups']
 del(node['groups'])
 # check for a no split
 if not left or not right:
 node['left'] = node['right'] = to_terminal(left + right)
 return
 # check for max depth
 if depth >= max_depth:
 node['left'], node['right'] = to_terminal(left), to_terminal(right)
 return
 # process left child
 if len(left) <= min_size:
 node['left'] = to_terminal(left)
 else:
 node['left'] = get_split(left)
 split(node['left'], max_depth, min_size, depth+1)
 # process right child
 if len(right) <= min_size:
 node['right'] = to_terminal(right)
 else:
 node['right'] = get_split(right)
 split(node['right'], max_depth, min_size, depth+1)

这里使用了递归编程,不断生成左叉树和右叉树。

9.构建决策树

# Build a decision tree
def build_tree(train, max_depth, min_size):
 root = get_split(train)
 split(root, max_depth, min_size, 1)
 return root 
tree=build_tree(train_set, max_depth, min_size)
print(tree)

10、预测test集

# Build a decision tree
def build_tree(train, max_depth, min_size):
 root = get_split(train)#获得最好的分割点,下标值,groups
 split(root, max_depth, min_size, 1)
 return root 
# tree=build_tree(train_set, max_depth, min_size)
# print(tree) 

# Make a prediction with a decision tree
def predict(node, row):
 print(row[node['index']])
 print(node['value'])
 if row[node['index']] < node['value']:#用测试集来代入训练的最好分割点,分割点有偏差时,通过搜索左右叉树来进一步比较。
 if isinstance(node['left'], dict):#如果是字典类型,执行操作
  return predict(node['left'], row)
 else:
  return node['left']
 else:
 if isinstance(node['right'], dict):
  return predict(node['right'], row)
 else:
  return node['right']
tree = build_tree(train_set, max_depth, min_size)
predictions = list()
for row in test_set:
 prediction = predict(tree, row)
 predictions.append(prediction)

11.评价决策树

# Evaluate an algorithm using a cross validation split
def evaluate_algorithm(dataset, algorithm, n_folds, *args):
 folds = cross_validation_split(dataset, n_folds)
 scores = list()
 for fold in folds:
 train_set = list(folds)
 train_set.remove(fold)
 train_set = sum(train_set, [])
 test_set = list()
 for row in fold:
  row_copy = list(row)
  test_set.append(row_copy)
  row_copy[-1] = None
 predicted = algorithm(train_set, test_set, *args)
 actual = [row[-1] for row in fold]
 accuracy = accuracy_metric(actual, predicted)
 scores.append(accuracy)
 return scores

以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持三水点靠木。

Python 相关文章推荐
浅谈Python中的可变对象和不可变对象
Jul 07 Python
Django的分页器实例(paginator)
Dec 01 Python
Python实现基于TCP UDP协议的IPv4 IPv6模式客户端和服务端功能示例
Mar 22 Python
python 读取文件并替换字段的实例
Jul 12 Python
Pyqt QImage 与 np array 转换方法
Jun 27 Python
Django Aggregation聚合使用方法解析
Aug 01 Python
解决Python对齐文本字符串问题
Aug 28 Python
python scipy卷积运算的实现方法
Sep 16 Python
在Python中画图(基于Jupyter notebook的魔法函数)
Oct 28 Python
快速解决jupyter启动卡死的问题
Apr 10 Python
python的pip有什么用
Jun 17 Python
浅析python 字典嵌套
Sep 29 Python
python编写分类决策树的代码
Dec 21 #Python
Python基于PyGraphics包实现图片截取功能的方法
Dec 21 #Python
用Python写王者荣耀刷金币脚本
Dec 21 #Python
python使用Apriori算法进行关联性解析
Dec 21 #Python
python实现kMeans算法
Dec 21 #Python
利用Tkinter(python3.6)实现一个简单计算器
Dec 21 #Python
python编写朴素贝叶斯用于文本分类
Dec 21 #Python
You might like
Terran兵种介绍
2020/03/14 星际争霸
PHP调用三种数据库的方法(3)
2006/10/09 PHP
php学习 函数 课件
2008/06/15 PHP
php 什么是PEAR?
2009/03/19 PHP
php和js如何通过json互相传递数据相关问题探讨
2013/02/26 PHP
强制PHP命令行脚本单进程运行的方法
2014/04/15 PHP
实例讲解PHP设计模式编程中的简单工厂模式
2016/02/29 PHP
PHP实现双链表删除与插入节点的方法示例
2017/11/11 PHP
用于判断用户注册时,密码强度的JS代码
2009/01/01 Javascript
JavaScript入门教程(2) JS基础知识
2009/01/31 Javascript
Javascript中的变量使用说明
2010/05/18 Javascript
Javascript 实现的数独解题算法网页实例
2013/10/15 Javascript
浅析JS中什么是自定义react数据验证组件
2018/10/19 Javascript
webpack中如何使用雪碧图的示例代码
2018/11/11 Javascript
基于Koa2写个脚手架模拟接口服务的方法
2018/11/27 Javascript
vue实现鼠标移入移出事件代码实例
2019/03/27 Javascript
js实现飞机大战小游戏
2020/08/26 Javascript
[47:55]Ti4第二日主赛事败者组 NaVi vs EG 1
2014/07/20 DOTA
[01:16:37]【全国守擂赛】第三周决赛 Dark Knight vs. 一个弱队
2020/05/04 DOTA
python新手经常遇到的17个错误分析
2014/07/30 Python
python pandas.DataFrame选取、修改数据最好用.loc,.iloc,.ix实现
2018/06/11 Python
pyqt 实现在Widgets中显示图片和文字的方法
2019/06/13 Python
Python绘制股票移动均线的实例
2019/08/24 Python
python Dijkstra算法实现最短路径问题的方法
2019/09/19 Python
Python使用os.listdir和os.walk获取文件路径
2020/05/21 Python
使用Python pip怎么升级pip
2020/08/11 Python
关于HTML5的安全问题开发人员需要牢记的
2012/06/21 HTML / CSS
Gap加拿大官网:Gap Canada
2017/08/24 全球购物
时尚设计师手表:The Watch Cabin
2018/10/06 全球购物
兼职业务员岗位职责
2014/01/01 职场文书
创业计划书——互联网商机
2014/01/12 职场文书
爱国卫生月活动总结范文
2014/04/25 职场文书
群众路线教育实践活动心得体会(四风)
2014/11/03 职场文书
2014年学校体育工作总结
2014/12/08 职场文书
JavaScript最完整的深浅拷贝实现方式详解
2022/02/28 Javascript
Java十分钟精通进阶适配器模式
2022/04/06 Java/Android