Python 第三方库

Backtraxe 收录于类别 Python

2021-08-15 2021-12-13 约 1112 字预计阅读 3 分钟

警告

本文最后更新于 2021-12-13，文中内容可能已过时。

常用 Python 第三方库。

机器学习

scikit-learn: machine learning in Python. 🌟48.1k
XGBoost: Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow. 🌟21.9k
LightGBM: A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks. 🌟13.2k
CatBoost: A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU. 🌟6.2k

TensorFlow: TensorFlow is an end-to-end open source platform for machine learning. 🌟161k
Pytorch: Tensors and Dynamic neural networks in Python with strong GPU acceleration. 🌟52.4k
MXNet: Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more. 🌟19.8k
PaddlePaddle: PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice. 🌟17.1k
JAX: Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more. 🌟15.2k

auto-sklearn: Automated Machine Learning with scikit-learn. 🌟5,907
PyCaret: An open-source, low-code machine learning library in Python. 🌟4,541
Auto-PyTorch: Automatic architecture search and hyperparameter optimization for PyTorch. 🌟1,464

Prophet: Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth. 🌟13,809
tsfresh: The package provides systematic time-series feature extraction by combining established algorithms from statistics, time-series analysis, signal processing, and nonlinear dynamics with a robust feature selection algorithm. 🌟6,074
sktime: A unified framework for machine learning with time series. 🌟4,721
Kats: Kats, a kit to analyze time series data, a lightweight, easy-to-use, generalizable, and extendable framework to perform time series analysis, from understanding the key statistics and characteristics, detecting change points and anomalies, to forecasting future trends. 🌟3,308
Darts: A python library for easy manipulation and forecasting of time series. 🌟3,208
GluonTS: Probabilistic time series modeling in Python. 🌟2,351
Merlion: A Machine Learning Framework for Time Series Intelligence. 🌟2,275
tslearn: A machine learning toolkit dedicated to time-series data. 🌟1,904
tsai: tsai is an open-source deep learning package built on top of Pytorch & fastai focused on state-of-the-art techniques for time series tasks like classification, regression, forecasting, imputation… 🌟1,443
Greykite: The Greykite library provides flexible, intuitive and fast forecasts through its flagship algorithm, Silverkite. 🌟1,358
mcfly: A deep learning tool for time series classification. 🌟337

Face Recognition: The world’s simplest facial recognition api for Python and the command line. 🌟42.3k

Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX. 🌟54.9k
jieba: “结巴”中文分词：做最好的 Python 中文分词组件。 🌟27.4k
HanLP: 中文分词词性标注命名实体识别依存句法分析成分句法分析语义依存分析语义角色标注指代消解风格转换语义相似度新词发现关键词短语提取自动摘要文本分类聚类拼音简繁转换自然语言处理。 🌟24.5k
spaCy: Industrial-strength Natural Language Processing (NLP) in Python. 🌟21.9k
AllenNLP: An open-source NLP research library, built on PyTorch. 🌟10.7k
NLTK: NLTK – the Natural Language Toolkit – is a suite of open source Python modules, data sets, and tutorials supporting research and development in Natural Language Processing. 🌟10.3k
TextBlob: Simple, Pythonic, text processing–Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more. 🌟8k
fastNLP: fastNLP 是一款面向自然语言处理（NLP）的轻量级框架，目标是快速实现NLP任务以及构建复杂模型。 🌟2.4k
textacy: NLP, before and after spaCy. 🌟1.8k
xmnlp: 提供中文分词, 词性标注, 命名体识别，情感分析，文本纠错，文本转拼音，文本摘要，偏旁部首等功能。 🌟736

NumPy: The fundamental package for scientific computing with Python. 🌟18.9k
SciPy: SciPy is an open-source software for mathematics, science, and engineering. It includes modules for statistics, optimization, integration, linear algebra, Fourier transforms, signal and image processing, ODE solvers, and more. 🌟8.9k

pandas: Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more. 🌟31.8k
pdfminer.six: It is a tool for extracting information from PDF documents. 🌟3.2k
openpyxl: openpyxl is a Python library to read/write Excel 2010 xlsx/xlsm/xltx/xltm files.

Matplotlib: Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. 🌟14.6k
seaborn: Seaborn is a Python visualization library based on matplotlib. It provides a high-level interface for drawing attractive statistical graphics. 🌟9k

Requests: A simple, yet elegant, HTTP library. 🌟46.5k
Scrapy: Scrapy, a fast high-level web crawling & scraping framework for Python. 🌟42.2k