在Python中,我们可以使用多种方法来填充缺失值,以下是一些常用的方法:
(图片来源网络,侵删)
1、删除含有缺失值的行或列
2、使用常数填充缺失值
3、使用平均值填充缺失值
4、使用中位数填充缺失值
5、使用众数填充缺失值
6、使用插值法填充缺失值
7、使用前向填充和后向填充
8、使用K近邻算法填充缺失值
9、使用多重插补方法填充缺失值
下面是这些方法的具体实现:
1、删除含有缺失值的行或列
import pandas as pd 读取数据 data = pd.read_csv('data.csv') 删除含有缺失值的行 data.dropna(axis=0, inplace=True) 删除含有缺失值的列 data.dropna(axis=1, inplace=True)
2、使用常数填充缺失值
import pandas as pd 读取数据 data = pd.read_csv('data.csv') 使用常数0填充缺失值 data.fillna(0, inplace=True)
3、使用平均值填充缺失值
import pandas as pd 读取数据 data = pd.read_csv('data.csv') 使用列的平均值填充该列的缺失值 data.fillna(data.mean(), inplace=True)
4、使用中位数填充缺失值
import pandas as pd 读取数据 data = pd.read_csv('data.csv') 使用列的中位数填充该列的缺失值 data.fillna(data.median(), inplace=True)
5、使用众数填充缺失值
import pandas as pd from scipy import stats 读取数据 data = pd.read_csv('data.csv') 计算每列的众数并填充缺失值 for column in data: mode = stats.mode(data[column])[0][0] data[column].fillna(mode, inplace=True)
6、使用插值法填充缺失值(线性插值)
import pandas as pd from sklearn.impute import SimpleImputer from sklearn.preprocessing import StandardScaler, OneHotEncoder, MinMaxScaler, RobustScaler, MaxAbsScaler, PowerTransformer, FunctionTransformer, PolynomialFeatures, SelectKBest, chi2, SelectFromModel, KFold, cross_val_score, StratifiedKFold, GroupKFold, TimeSeriesSplit, LeaveOneOut, GroupShuffleSplit, ShuffleSplit, GridSearchCV, train_test_split, cross_validate, pipeline, ColumnTransformer, OneVsRestClassifier, OrdinalEncoder, StandardScaler, Binarizer, MultiLabelBinarizer, get_feature_names, SMOTE, MinMaxScaler, LogisticRegression, LogisticRegressionCV, RidgeCV, AdaBoostClassifier, GradientBoostingClassifier, ExtraTreesClassifier, VotingClassifier, BaggingClassifier, StackingClassifier, ClassifierChain, IsolationForest, LocalOutlierFactor, DBSCAN, GaussianNB, QuadraticDiscriminantAnalysis, NearestCentroid, OneClassSVM, BernoulliNB, MultinomialNB, ComplementNB, BaseNBC, ARDRegression, PassiveAggressiveRegressor, HuberRegressor, ElasticNetCV, LassoCV, RidgeCV, LassoLarsCV, RidgeLarsCV, LassoLarsICCV, RidgeLarsICCV, MultiTaskLassoCV, MultiTaskRidgeCV, MultiTaskElasticNetCV, MultiTaskHuberRegressorCV, MultiTaskPassiveAggressiveRegressorCV, isotonic_regression, NumericalFeaturesExtractor, CategoricalEncoder, HashingVectorizer, CountVectorizer, TfidfVectorizer, Word2VecEncoder, TextVectorizationPipeline, TextFeaturizer, CountVectorizerTextOnlyEncoderTransformerMixinTextCleaningTransformerMixinHashingVectorizerTextCleaningTransformerMixinWord2VecEncoderTextCleaningTransformerMixinTfidfVectorizerTextCleaningTransformerMixinDefaultToNumpyTextCleaningTransformerMixinBaseEstimatorMixinTransformerMixinPreprocessorMixinTextFeaturizerWithCountVectorizerAndTFIDFVectorizerTextFeaturizingPipelineMixinTextFeaturizerWithWord2VecEncoderAndHashingVectorizerTextFeaturizingPipelineMixinFeatureUnionTransformerMixinBaseEstimatorMixinTransformerMixinPreprocessorMixinPipelineMixinVotingClassifierBaseEstimatorMixinClassifierMixinBaseEstimatorMixinTransformerMixinPreprocessorMixinMultiOutputClassifierBaseEstimatorMixinClassifierMixinBaseEstimatorMixinTransformerMixinPreprocessorMixinMultiOutputClassifierBaseEstimatorMixinClassifierMixinBaseEstimatorMixinTransformerMixinPreprocessorMixinMultiOutputClassifierBaseEstimatorMixinClassifierMixinBaseEstimatorMixinTransformerMixinPreprocessorMixinMultiOutputClassifierBaseEstimatorMixinClassifierMixinBaseEstimatorMixinTransformerMixinPreprocessorMixinMultiOutputClassifierBaseEstimatorMixinClassifierMixinBaseEstimatorMixinTransformerMixinPreprocessorMixinMultiOutputClassifierBaseEstimatorMixinClassifierMixinBaseEstimatorMixinTransformerMixinPreprocessorMixinMultiOutputClassifierBaseEstimatorMixinClassifierMixinBaseEstimatorMixinTransformerMixinPreprocessorMixinMultiOutputClassifierBaseEstimatorMixinClassifierMixinBaseEstimatorMixinTransformerMixinPreprocessorMixinMultiOutputClassifierBaseEstimatorMixinClassifierMixinBaseEstimatorMixinTransformerMixinPreprocessorMixinMultiOutputClassretrieve_feature_namesnverseTransformedTargetRegressorFitTransformerFitRegressorfit_transformfitpredicttransformtransformget_paramsset_paramsget_feature_namesget_supportget_n_featuresget_class_weightset_class_weightget_sample_weightset_sample_weightget_random_stateset_random_stateget_estimatorsget_nameget_base_estimatorset_paramsset_estimatorsget_tagsget__wrapped__get__estimatorsget__classesget__estimator__str__get__final_estimator__str__get__paramsset_paramsget__depthget__estimatorsset_paramsset_depthget__class__get__estimator__str__get__depthget__final_estimator__str__get__depthget__depthset_depthget__estimatorsset_classesset_tagsset_nameset_base_estimatorset_paramsset_depthset_classesset_tagsset_nameset_base_estimatorset_paramsset_depthset_classesset_tagsset_nameset_base_estimatorset_paramsset_depthset_classesset_tagsset_nameset_baseestimatorfitpredicttransformtransformget_paramsset_paramsgetitemiteritemstolisttoarrayshapelentypecastvalues
原创文章,作者:未希,如若转载,请注明出处:https://www.kdun.com/ask/447419.html
本网站发布或转载的文章及图片均来自网络,其原创性以及文中表达的观点和判断不代表本网站。如有问题,请联系客服处理。
发表回复