python实现随机森林random forest的原理及方法


Posted in Python onDecember 21, 2017

引言

想通过随机森林来获取数据的主要特征

1、理论

随机森林是一个高度灵活的机器学习方法,拥有广泛的应用前景,从市场营销到医疗保健保险。 既可以用来做市场营销模拟的建模,统计客户来源,保留和流失。也可用来预测疾病的风险和病患者的易感性。

根据个体学习器的生成方式,目前的集成学习方法大致可分为两大类,即个体学习器之间存在强依赖关系,必须串行生成的序列化方法,以及个体学习器间不存在强依赖关系,可同时生成的并行化方法;

前者的代表是Boosting,后者的代表是Bagging和“随机森林”(Random
Forest)

随机森林在以决策树为基学习器构建Bagging集成的基础上,进一步在决策树的训练过程中引入了随机属性选择(即引入随机特征选择)。

简单来说,随机森林就是对决策树的集成,但有两点不同:

(2)特征选取的差异性:每个决策树的n个分类特征是在所有特征中随机选择的(n是一个需要我们自己调整的参数)
随机森林,简单理解, 比如预测salary,就是构建多个决策树job,age,house,然后根据要预测的量的各个特征(teacher,39,suburb)分别在对应决策树的目标值概率(salary<5000,salary>=5000),从而,确定预测量的发生概率(如,预测出P(salary<5000)=0.3).

随机森林是一个可做能够回归和分类。 它具备处理大数据的特性,而且它有助于估计或变量是非常重要的基础数据建模。

参数说明:

最主要的两个参数是n_estimators和max_features。

n_estimators:表示森林里树的个数。理论上是越大越好。但是伴随着就是计算时间的增长。但是并不是取得越大就会越好,预测效果最好的将会出现在合理的树个数。

max_features:随机选择特征集合的子集合,并用来分割节点。子集合的个数越少,方差就会减少的越快,但同时偏差就会增加的越快。根据较好的实践经验。如果是回归问题则:

max_features=n_features,如果是分类问题则max_features=sqrt(n_features)。

如果想获取较好的结果,必须将max_depth=None,同时min_sample_split=1。
同时还要记得进行cross_validated(交叉验证),除此之外记得在random forest中,bootstrap=True。但在extra-trees中,bootstrap=False。

2、随机森林python实现

2.1Demo1

实现随机森林基本功能

#随机森林
from sklearn.tree import DecisionTreeRegressor 
from sklearn.ensemble import RandomForestRegressor 
import numpy as np 

from sklearn.datasets import load_iris 
iris=load_iris() 
#print iris#iris的4个属性是:萼片宽度 萼片长度 花瓣宽度 花瓣长度 标签是花的种类:setosa versicolour virginica 
print(iris['target'].shape)
rf=RandomForestRegressor()#这里使用了默认的参数设置 
rf.fit(iris.data[:150],iris.target[:150])#进行模型的训练 

#随机挑选两个预测不相同的样本 
instance=iris.data[[100,109]] 
print(instance)
rf.predict(instance[[0]])
print('instance 0 prediction;',rf.predict(instance[[0]]))
print( 'instance 1 prediction;',rf.predict(instance[[1]]))
print(iris.target[100],iris.target[109])

运行结果

(150,)
[[ 6.3  3.3  6.   2.5]
 [ 7.2  3.6  6.1  2.5]]
instance 0 prediction; [ 2.]
instance 1 prediction; [ 2.]
2 2

2.2 Demo2

3种方法的比较

#random forest test
from sklearn.model_selection import cross_val_score
from sklearn.datasets import make_blobs
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.tree import DecisionTreeClassifier
X, y = make_blobs(n_samples=10000, n_features=10, centers=100,random_state=0)

clf = DecisionTreeClassifier(max_depth=None, min_samples_split=2,random_state=0)
scores = cross_val_score(clf, X, y)
print(scores.mean())    


clf = RandomForestClassifier(n_estimators=10, max_depth=None,min_samples_split=2, random_state=0)
scores = cross_val_score(clf, X, y)
print(scores.mean())    

clf = ExtraTreesClassifier(n_estimators=10, max_depth=None,min_samples_split=2, random_state=0)
scores = cross_val_score(clf, X, y)
print(scores.mean())

运行结果:

0.979408793821
0.999607843137
0.999898989899

2.3 Demo3-实现特征选择

#随机森林2
from sklearn.tree import DecisionTreeRegressor 
from sklearn.ensemble import RandomForestRegressor 
import numpy as np 
from sklearn.datasets import load_iris 
iris=load_iris() 

from sklearn.model_selection import cross_val_score, ShuffleSplit 
X = iris["data"] 
Y = iris["target"] 
names = iris["feature_names"] 
rf = RandomForestRegressor() 
scores = [] 
for i in range(X.shape[1]): 
 score = cross_val_score(rf, X[:, i:i+1], Y, scoring="r2", 
    cv=ShuffleSplit(len(X), 3, .3)) 
 scores.append((round(np.mean(score), 3), names[i])) 
print(sorted(scores, reverse=True))

运行结果:

[(0.89300000000000002, 'petal width (cm)'), (0.82099999999999995, 'petal length
(cm)'), (0.13, 'sepal length (cm)'), (-0.79100000000000004, 'sepal width (cm)')]

2.4 demo4-随机森林

本来想利用以下代码来构建随机随机森林决策树,但是,遇到的问题是,程序一直在运行,无法响应,还需要调试。

#随机森林4
#coding:utf-8 
import csv 
from random import seed 
from random import randrange 
from math import sqrt 
def loadCSV(filename):#加载数据,一行行的存入列表 
 dataSet = [] 
 with open(filename, 'r') as file: 
 csvReader = csv.reader(file) 
 for line in csvReader: 
  dataSet.append(line) 
 return dataSet 

# 除了标签列,其他列都转换为float类型 
def column_to_float(dataSet): 
 featLen = len(dataSet[0]) - 1 
 for data in dataSet: 
 for column in range(featLen): 
  data[column] = float(data[column].strip()) 

# 将数据集随机分成N块,方便交叉验证,其中一块是测试集,其他四块是训练集 
def spiltDataSet(dataSet, n_folds): 
 fold_size = int(len(dataSet) / n_folds) 
 dataSet_copy = list(dataSet) 
 dataSet_spilt = [] 
 for i in range(n_folds): 
 fold = [] 
 while len(fold) < fold_size: # 这里不能用if,if只是在第一次判断时起作用,while执行循环,直到条件不成立 
  index = randrange(len(dataSet_copy)) 
  fold.append(dataSet_copy.pop(index)) # pop() 函数用于移除列表中的一个元素(默认最后一个元素),并且返回该元素的值。 
 dataSet_spilt.append(fold) 
 return dataSet_spilt 

# 构造数据子集 
def get_subsample(dataSet, ratio): 
 subdataSet = [] 
 lenSubdata = round(len(dataSet) * ratio)#返回浮点数 
 while len(subdataSet) < lenSubdata: 
 index = randrange(len(dataSet) - 1) 
 subdataSet.append(dataSet[index]) 
 # print len(subdataSet) 
 return subdataSet 

# 分割数据集 
def data_spilt(dataSet, index, value): 
 left = [] 
 right = [] 
 for row in dataSet: 
 if row[index] < value: 
  left.append(row) 
 else: 
  right.append(row) 
 return left, right 

# 计算分割代价 
def spilt_loss(left, right, class_values): 
 loss = 0.0 
 for class_value in class_values: 
 left_size = len(left) 
 if left_size != 0: # 防止除数为零 
  prop = [row[-1] for row in left].count(class_value) / float(left_size) 
  loss += (prop * (1.0 - prop)) 
 right_size = len(right) 
 if right_size != 0: 
  prop = [row[-1] for row in right].count(class_value) / float(right_size) 
  loss += (prop * (1.0 - prop)) 
 return loss 

# 选取任意的n个特征,在这n个特征中,选取分割时的最优特征 
def get_best_spilt(dataSet, n_features): 
 features = [] 
 class_values = list(set(row[-1] for row in dataSet)) 
 b_index, b_value, b_loss, b_left, b_right = 999, 999, 999, None, None 
 while len(features) < n_features: 
 index = randrange(len(dataSet[0]) - 1) 
 if index not in features: 
  features.append(index) 
 # print 'features:',features 
 for index in features:#找到列的最适合做节点的索引,(损失最小) 
 for row in dataSet: 
  left, right = data_spilt(dataSet, index, row[index])#以它为节点的,左右分支 
  loss = spilt_loss(left, right, class_values) 
  if loss < b_loss:#寻找最小分割代价 
  b_index, b_value, b_loss, b_left, b_right = index, row[index], loss, left, right 
 # print b_loss 
 # print type(b_index) 
 return {'index': b_index, 'value': b_value, 'left': b_left, 'right': b_right} 

# 决定输出标签 
def decide_label(data): 
 output = [row[-1] for row in data] 
 return max(set(output), key=output.count) 

# 子分割,不断地构建叶节点的过程对对对 
def sub_spilt(root, n_features, max_depth, min_size, depth): 
 left = root['left'] 
 # print left 
 right = root['right'] 
 del (root['left']) 
 del (root['right']) 
 # print depth 
 if not left or not right: 
 root['left'] = root['right'] = decide_label(left + right) 
 # print 'testing' 
 return 
 if depth > max_depth: 
 root['left'] = decide_label(left) 
 root['right'] = decide_label(right) 
 return 
 if len(left) < min_size: 
 root['left'] = decide_label(left) 
 else: 
 root['left'] = get_best_spilt(left, n_features) 
 # print 'testing_left' 
 sub_spilt(root['left'], n_features, max_depth, min_size, depth + 1) 
 if len(right) < min_size: 
 root['right'] = decide_label(right) 
 else: 
 root['right'] = get_best_spilt(right, n_features) 
 # print 'testing_right' 
 sub_spilt(root['right'], n_features, max_depth, min_size, depth + 1) 

 # 构造决策树 
def build_tree(dataSet, n_features, max_depth, min_size): 
 root = get_best_spilt(dataSet, n_features) 
 sub_spilt(root, n_features, max_depth, min_size, 1) 
 return root 
# 预测测试集结果 
def predict(tree, row): 
 predictions = [] 
 if row[tree['index']] < tree['value']: 
 if isinstance(tree['left'], dict): 
  return predict(tree['left'], row) 
 else: 
  return tree['left'] 
 else: 
 if isinstance(tree['right'], dict): 
  return predict(tree['right'], row) 
 else: 
  return tree['right'] 
  # predictions=set(predictions) 
def bagging_predict(trees, row): 
 predictions = [predict(tree, row) for tree in trees] 
 return max(set(predictions), key=predictions.count) 
# 创建随机森林 
def random_forest(train, test, ratio, n_feature, max_depth, min_size, n_trees): 
 trees = [] 
 for i in range(n_trees): 
 train = get_subsample(train, ratio)#从切割的数据集中选取子集 
 tree = build_tree(train, n_features, max_depth, min_size) 
 # print 'tree %d: '%i,tree 
 trees.append(tree) 
 # predict_values = [predict(trees,row) for row in test] 
 predict_values = [bagging_predict(trees, row) for row in test] 
 return predict_values 
# 计算准确率 
def accuracy(predict_values, actual): 
 correct = 0 
 for i in range(len(actual)): 
 if actual[i] == predict_values[i]: 
  correct += 1 
 return correct / float(len(actual)) 
if __name__ == '__main__': 
 seed(1) 
 dataSet = loadCSV(r'G:\0研究生\tianchiCompetition\训练小样本2.csv') 
 column_to_float(dataSet) 
 n_folds = 5 
 max_depth = 15 
 min_size = 1 
 ratio = 1.0 
 # n_features=sqrt(len(dataSet)-1) 
 n_features = 15 
 n_trees = 10 
 folds = spiltDataSet(dataSet, n_folds)#先是切割数据集 
 scores = [] 
 for fold in folds: 
 train_set = folds[ 
   :] # 此处不能简单地用train_set=folds,这样用属于引用,那么当train_set的值改变的时候,folds的值也会改变,所以要用复制的形式。(L[:])能够复制序列,D.copy() 能够复制字典,list能够生成拷贝 list(L) 
 train_set.remove(fold)#选好训练集 
 # print len(folds) 
 train_set = sum(train_set, []) # 将多个fold列表组合成一个train_set列表 
 # print len(train_set) 
 test_set = [] 
 for row in fold: 
  row_copy = list(row) 
  row_copy[-1] = None 
  test_set.append(row_copy) 
  # for row in test_set: 
  # print row[-1] 
 actual = [row[-1] for row in fold] 
 predict_values = random_forest(train_set, test_set, ratio, n_features, max_depth, min_size, n_trees) 
 accur = accuracy(predict_values, actual) 
 scores.append(accur) 
 print ('Trees is %d' % n_trees) 
 print ('scores:%s' % scores) 
 print ('mean score:%s' % (sum(scores) / float(len(scores))))

2.5 随机森林分类sonic data

# CART on the Bank Note dataset
from random import seed
from random import randrange
from csv import reader

# Load a CSV file
def load_csv(filename):
 file = open(filename, "r")
 lines = reader(file)
 dataset = list(lines)
 return dataset

# Convert string column to float
def str_column_to_float(dataset, column):
 for row in dataset:
 row[column] = float(row[column].strip())

# Split a dataset into k folds
def cross_validation_split(dataset, n_folds):
 dataset_split = list()
 dataset_copy = list(dataset)
 fold_size = int(len(dataset) / n_folds)
 for i in range(n_folds):
 fold = list()
 while len(fold) < fold_size:
  index = randrange(len(dataset_copy))
  fold.append(dataset_copy.pop(index))
 dataset_split.append(fold)
 return dataset_split

# Calculate accuracy percentage
def accuracy_metric(actual, predicted):
 correct = 0
 for i in range(len(actual)):
 if actual[i] == predicted[i]:
  correct += 1
 return correct / float(len(actual)) * 100.0

# Evaluate an algorithm using a cross validation split
def evaluate_algorithm(dataset, algorithm, n_folds, *args):
 folds = cross_validation_split(dataset, n_folds)
 scores = list()
 for fold in folds:
 train_set = list(folds)
 train_set.remove(fold)
 train_set = sum(train_set, [])
 test_set = list()
 for row in fold:
  row_copy = list(row)
  test_set.append(row_copy)
  row_copy[-1] = None
 predicted = algorithm(train_set, test_set, *args)
 actual = [row[-1] for row in fold]
 accuracy = accuracy_metric(actual, predicted)
 scores.append(accuracy)
 return scores

# Split a data set based on an attribute and an attribute value
def test_split(index, value, dataset):
 left, right = list(), list()
 for row in dataset:
 if row[index] < value:
  left.append(row)
 else:
  right.append(row)
 return left, right

# Calculate the Gini index for a split dataset
def gini_index(groups, class_values):
 gini = 0.0
 for class_value in class_values:
 for group in groups:
  size = len(group)
  if size == 0:
  continue
  proportion = [row[-1] for row in group].count(class_value) / float(size)
  gini += (proportion * (1.0 - proportion))
 return gini

# Select the best split point for a dataset
def get_split(dataset):
 class_values = list(set(row[-1] for row in dataset))
 b_index, b_value, b_score, b_groups = 999, 999, 999, None
 for index in range(len(dataset[0])-1):
 for row in dataset:
  groups = test_split(index, row[index], dataset)
  gini = gini_index(groups, class_values)
  if gini < b_score:
  b_index, b_value, b_score, b_groups = index, row[index], gini, groups
 print ({'index':b_index, 'value':b_value})
 return {'index':b_index, 'value':b_value, 'groups':b_groups}

# Create a terminal node value
def to_terminal(group):
 outcomes = [row[-1] for row in group]
 return max(set(outcomes), key=outcomes.count)

# Create child splits for a node or make terminal
def split(node, max_depth, min_size, depth):
 left, right = node['groups']
 del(node['groups'])
 # check for a no split
 if not left or not right:
 node['left'] = node['right'] = to_terminal(left + right)
 return
 # check for max depth
 if depth >= max_depth:
 node['left'], node['right'] = to_terminal(left), to_terminal(right)
 return
 # process left child
 if len(left) <= min_size:
 node['left'] = to_terminal(left)
 else:
 node['left'] = get_split(left)
 split(node['left'], max_depth, min_size, depth+1)
 # process right child
 if len(right) <= min_size:
 node['right'] = to_terminal(right)
 else:
 node['right'] = get_split(right)
 split(node['right'], max_depth, min_size, depth+1)

# Build a decision tree
def build_tree(train, max_depth, min_size):
 root = get_split(train)
 split(root, max_depth, min_size, 1)
 return root

# Make a prediction with a decision tree
def predict(node, row):
 if row[node['index']] < node['value']:
 if isinstance(node['left'], dict):
  return predict(node['left'], row)
 else:
  return node['left']
 else:
 if isinstance(node['right'], dict):
  return predict(node['right'], row)
 else:
  return node['right']

# Classification and Regression Tree Algorithm
def decision_tree(train, test, max_depth, min_size):
 tree = build_tree(train, max_depth, min_size)
 predictions = list()
 for row in test:
 prediction = predict(tree, row)
 predictions.append(prediction)
 return(predictions)

# Test CART on Bank Note dataset
seed(1)
# load and prepare data
filename = r'G:\0pythonstudy\决策树\sonar.all-data.csv'
dataset = load_csv(filename)
# convert string attributes to integers
for i in range(len(dataset[0])-1):
 str_column_to_float(dataset, i)
# evaluate algorithm
n_folds = 5
max_depth = 5
min_size = 10
scores = evaluate_algorithm(dataset, decision_tree, n_folds, max_depth, min_size)
print('Scores: %s' % scores)
print('Mean Accuracy: %.3f%%' % (sum(scores)/float(len(scores))))

运行结果:

{'index': 38, 'value': 0.0894}
{'index': 36, 'value': 0.8459}
{'index': 50, 'value': 0.0024}
{'index': 15, 'value': 0.0906}
{'index': 16, 'value': 0.9819}
{'index': 10, 'value': 0.0785}
{'index': 16, 'value': 0.0886}
{'index': 38, 'value': 0.0621}
{'index': 5, 'value': 0.0226}
{'index': 8, 'value': 0.0368}
{'index': 11, 'value': 0.0754}
{'index': 0, 'value': 0.0239}
{'index': 8, 'value': 0.0368}
{'index': 29, 'value': 0.1671}
{'index': 46, 'value': 0.0237}
{'index': 38, 'value': 0.0621}
{'index': 14, 'value': 0.0668}
{'index': 4, 'value': 0.0167}
{'index': 37, 'value': 0.0836}
{'index': 12, 'value': 0.0616}
{'index': 7, 'value': 0.0333}
{'index': 33, 'value': 0.8741}
{'index': 16, 'value': 0.0886}
{'index': 8, 'value': 0.0368}
{'index': 33, 'value': 0.0798}
{'index': 44, 'value': 0.0298}
Scores: [48.78048780487805, 70.73170731707317, 58.536585365853654, 51.2195121951
2195, 39.02439024390244]
Mean Accuracy: 53.659%
请按任意键继续. . .

知识点:

1.load CSV file

from csv import reader
# Load a CSV file
def load_csv(filename):
 file = open(filename, "r")
 lines = reader(file)
 dataset = list(lines)
 return dataset
filename = r'G:\0pythonstudy\决策树\sonar.all-data.csv'
dataset=load_csv(filename)
print(dataset)

2.把数据转化成float格式

# Convert string column to float
def str_column_to_float(dataset, column):
 for row in dataset:
 row[column] = float(row[column].strip())
 # print(row[column])
# convert string attributes to integers
for i in range(len(dataset[0])-1):
 str_column_to_float(dataset, i)

3.把最后一列的分类字符串转化成0、1整数

def str_column_to_int(dataset, column):
 class_values = [row[column] for row in dataset]#生成一个class label的list
 # print(class_values)
 unique = set(class_values)#set 获得list的不同元素
 print(unique)
 lookup = dict()#定义一个字典
 # print(enumerate(unique))
 for i, value in enumerate(unique):
 lookup[value] = i
 # print(lookup)
 for row in dataset:
 row[column] = lookup[row[column]]
 print(lookup['M'])

4、把数据集分割成K份

# Split a dataset into k folds
def cross_validation_split(dataset, n_folds):
 dataset_split = list()#生成空列表
 dataset_copy = list(dataset)
 print(len(dataset_copy))
 print(len(dataset))
 #print(dataset_copy)
 fold_size = int(len(dataset) / n_folds)
 for i in range(n_folds):
 fold = list()
 while len(fold) < fold_size:
  index = randrange(len(dataset_copy))
  # print(index)
  fold.append(dataset_copy.pop(index))#使用.pop()把里边的元素都删除(相当于转移),这k份元素各不相同。
 dataset_split.append(fold)
 return dataset_split
n_folds=5 
folds = cross_validation_split(dataset, n_folds)#k份元素各不相同的训练集

5.计算正确率

# Calculate accuracy percentage
def accuracy_metric(actual, predicted):
 correct = 0
 for i in range(len(actual)):
 if actual[i] == predicted[i]:
  correct += 1
 return correct / float(len(actual)) * 100.0#这个是二值分类正确性的表达式

6.二分类每列

# Split a data set based on an attribute and an attribute value
def test_split(index, value, dataset):
 left, right = list(), list()#初始化两个空列表
 for row in dataset:
 if row[index] < value:
  left.append(row)
 else:
  right.append(row)
 return left, right #返回两个列表,每个列表以value为界限对指定行(index)进行二分类。

7.使用gini系数来获得最佳分割点

# Calculate the Gini index for a split dataset
def gini_index(groups, class_values):
 gini = 0.0
 for class_value in class_values:
 for group in groups:
  size = len(group)
  if size == 0:
  continue
  proportion = [row[-1] for row in group].count(class_value) / float(size)
  gini += (proportion * (1.0 - proportion))
 return gini

# Select the best split point for a dataset
def get_split(dataset):
 class_values = list(set(row[-1] for row in dataset))
 b_index, b_value, b_score, b_groups = 999, 999, 999, None
 for index in range(len(dataset[0])-1):
 for row in dataset:
  groups = test_split(index, row[index], dataset)
  gini = gini_index(groups, class_values)
  if gini < b_score:
  b_index, b_value, b_score, b_groups = index, row[index], gini, groups
 # print(groups)
 print ({'index':b_index, 'value':b_value,'score':gini})
 return {'index':b_index, 'value':b_value, 'groups':b_groups}

这段代码,在求gini指数,直接应用定义式,不难理解。获得最佳分割点可能比较难看懂,这里用了两层迭代,一层是对不同列的迭代,一层是对不同行的迭代。并且,每次迭代,都对gini系数进行更新。

8、决策树生成

# Create child splits for a node or make terminal
def split(node, max_depth, min_size, depth):
 left, right = node['groups']
 del(node['groups'])
 # check for a no split
 if not left or not right:
 node['left'] = node['right'] = to_terminal(left + right)
 return
 # check for max depth
 if depth >= max_depth:
 node['left'], node['right'] = to_terminal(left), to_terminal(right)
 return
 # process left child
 if len(left) <= min_size:
 node['left'] = to_terminal(left)
 else:
 node['left'] = get_split(left)
 split(node['left'], max_depth, min_size, depth+1)
 # process right child
 if len(right) <= min_size:
 node['right'] = to_terminal(right)
 else:
 node['right'] = get_split(right)
 split(node['right'], max_depth, min_size, depth+1)

这里使用了递归编程,不断生成左叉树和右叉树。

9.构建决策树

# Build a decision tree
def build_tree(train, max_depth, min_size):
 root = get_split(train)
 split(root, max_depth, min_size, 1)
 return root 
tree=build_tree(train_set, max_depth, min_size)
print(tree)

10、预测test集

# Build a decision tree
def build_tree(train, max_depth, min_size):
 root = get_split(train)#获得最好的分割点,下标值,groups
 split(root, max_depth, min_size, 1)
 return root 
# tree=build_tree(train_set, max_depth, min_size)
# print(tree) 

# Make a prediction with a decision tree
def predict(node, row):
 print(row[node['index']])
 print(node['value'])
 if row[node['index']] < node['value']:#用测试集来代入训练的最好分割点,分割点有偏差时,通过搜索左右叉树来进一步比较。
 if isinstance(node['left'], dict):#如果是字典类型,执行操作
  return predict(node['left'], row)
 else:
  return node['left']
 else:
 if isinstance(node['right'], dict):
  return predict(node['right'], row)
 else:
  return node['right']
tree = build_tree(train_set, max_depth, min_size)
predictions = list()
for row in test_set:
 prediction = predict(tree, row)
 predictions.append(prediction)

11.评价决策树

# Evaluate an algorithm using a cross validation split
def evaluate_algorithm(dataset, algorithm, n_folds, *args):
 folds = cross_validation_split(dataset, n_folds)
 scores = list()
 for fold in folds:
 train_set = list(folds)
 train_set.remove(fold)
 train_set = sum(train_set, [])
 test_set = list()
 for row in fold:
  row_copy = list(row)
  test_set.append(row_copy)
  row_copy[-1] = None
 predicted = algorithm(train_set, test_set, *args)
 actual = [row[-1] for row in fold]
 accuracy = accuracy_metric(actual, predicted)
 scores.append(accuracy)
 return scores

以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持三水点靠木。

Python 相关文章推荐
python实现爬虫下载漫画示例
Feb 16 Python
用Python代码来绘制彭罗斯点阵的教程
Apr 03 Python
黑科技 Python脚本帮你找出微信上删除你好友的人
Jan 07 Python
Python中多线程的创建及基本调用方法
Jul 08 Python
python+pandas+时间、日期以及时间序列处理方法
Jul 10 Python
python 在屏幕上逐字显示一行字的实例
Dec 24 Python
Python生命游戏实现原理及过程解析(附源代码)
Aug 01 Python
Python 使用matplotlib模块模拟掷骰子
Aug 08 Python
python实现邮件发送功能
Aug 10 Python
使用Python画了一棵圣诞树的实例代码
Nov 27 Python
python基于pexpect库自动获取日志信息
Feb 01 Python
Python通过loop.run_in_executor执行同步代码 同步变为异步
Apr 11 Python
python编写分类决策树的代码
Dec 21 #Python
Python基于PyGraphics包实现图片截取功能的方法
Dec 21 #Python
用Python写王者荣耀刷金币脚本
Dec 21 #Python
python使用Apriori算法进行关联性解析
Dec 21 #Python
python实现kMeans算法
Dec 21 #Python
利用Tkinter(python3.6)实现一个简单计算器
Dec 21 #Python
python编写朴素贝叶斯用于文本分类
Dec 21 #Python
You might like
再说下636单管机
2021/03/02 无线电
通过PHP current函数获取未知字符键名数组第一个元素的值
2013/06/24 PHP
php实现利用phpexcel导出数据
2013/08/24 PHP
PHP实现的带超时功能get_headers函数
2015/02/10 PHP
PHP中常见的缓存技术实例分析
2015/09/23 PHP
PHP unlink与rmdir删除目录及目录下所有文件实例代码
2018/02/07 PHP
PHP实现将base64编码字符串转换成图片示例
2018/06/22 PHP
JavaScript Event学习第九章 鼠标事件
2010/02/08 Javascript
JavaScript 类的定义和引用 JavaScript高级培训 自定义对象
2010/04/27 Javascript
Js 导出table内容到Excel的简单实例
2013/11/19 Javascript
html文本框提示效果的示例代码
2014/06/28 Javascript
javascript实现右侧弹出“分享到”窗口效果
2016/02/01 Javascript
基于React.js实现原生js拖拽效果引发的思考
2016/03/30 Javascript
JavaScript中各种引用类型的常用操作方法小结
2016/05/05 Javascript
JS查找字符串中出现最多的字符及个数统计
2017/02/04 Javascript
详解bootstrap导航栏.nav与.navbar区别
2017/11/23 Javascript
Angular学习教程之RouterLink花式跳转
2018/05/03 Javascript
JS构造一个html文本内容成文件流形式发送到后台
2018/07/31 Javascript
JavaScript实现连连看连线算法
2019/01/05 Javascript
关于layui导航栏不展示下拉列表的解决方法
2019/09/25 Javascript
JS数组进阶示例【数组的几种函数用法】
2020/01/16 Javascript
[03:27]最受玩家喜爱奖提名:PZH_Element 致玩家寄语
2016/12/20 DOTA
[00:35]2016完美“圣”典风云人物:冷冷宣传片
2016/12/08 DOTA
python 不关闭控制台的实现方法
2011/10/23 Python
一个简单的python程序实例(通讯录)
2013/11/29 Python
python模仿网页版微信发送消息功能
2018/02/24 Python
python学习入门细节知识点
2018/03/29 Python
python中for用来遍历range函数的方法
2018/06/08 Python
Python函数装饰器常见使用方法实例详解
2019/03/30 Python
HTML5的表单(绝对特别强大的功能)使用示例
2013/06/20 HTML / CSS
澳大利亚现代波西米亚风格女装网站:Bohemian Traders
2018/04/16 全球购物
幼儿园教师节活动方案
2014/02/02 职场文书
2014年民政局关于保密工作整改措施
2014/09/19 职场文书
毕业生见习报告总结
2014/11/08 职场文书
金融专业银行实习证明模板
2014/11/28 职场文书
2021好看的国漫排行榜前十名 《完美世界》上榜,《元龙》排名第一
2022/03/18 国漫