alecor.net

Search the site:

2023-02-27

List of Python libraries for Data Science & Machine Learning

Summary:

A small curated list of Python libraries and frameworks in use for Data Science, Quantitative Development and other technical use cases. Of course, you may need any of these in other domains as well.

Here is a short curated list of some of the most commonly used Python frameworks and libraries in Machine Learning and Data Science, in no particular order:

  • NumPy - a library for working with arrays and numerical operations
  • Pandas - a library for data manipulation and analysis
  • Scikit-learn - a library for machine learning algorithms and tools
  • TensorFlow - an open-source software library for dataflow and differentiable programming
  • PyTorch - an open-source machine learning framework for building and training neural networks
  • Keras - a high-level neural networks API, written in Python and capable of running on top of TensorFlow, Theano, or CNTK.
  • Matplotlib - a plotting library for creating static, animated, and interactive visualizations in Python
  • Seaborn - a visualization library based on Matplotlib, designed for statistical graphics
  • SciPy - a library for scientific computing and technical computing
  • Statsmodels - a library for statistical modeling and testing
  • NLTK - a natural language processing library for Python
  • Gensim - a library for topic modeling and document similarity
  • spaCy - a library for natural language processing and text analysis
  • OpenCV - a computer vision library for real-time image processing and object recognition
  • H2O - an open-source machine learning platform that supports distributed computing and big data
  • XGBoost - an optimized distributed gradient boosting library designed to be highly efficient, flexible, and portable
  • LightGBM - a gradient boosting framework that uses tree-based learning algorithms
  • CatBoost - a machine learning library that uses gradient-boosted decision trees
  • Dask - a library for parallel computing in Python
  • Vaex - a library for lazy and out-of-memory data processing and visualization
  • PySpark - a library for distributed computing using Apache Spark
  • Altair - a declarative visualization library for creating interactive visualizations in Python
  • Plotly - a visualization library for creating interactive web-based visualizations
  • Bokeh - a library for creating interactive visualizations and data applications in the browser
  • Dash - a web application framework for building analytical applications using Python and HTML
  • Streamlit - an open-source framework for creating interactive web applications for machine learning and data science
  • Prophet - a forecasting library for time series data
  • TensorFlow Probability - a library for probabilistic modeling and Bayesian inference in TensorFlow
  • PyMC3 - a Python library for Bayesian modeling and probabilistic programming
  • Scrapy - a web crawling and scraping framework for Python

This is not an exhaustive list and there are many other libraries and frameworks that can be useful in Machine Learning and Data Science. The popularity and usage of specific libraries may also vary depending on the specific task, application, and industry.

Nothing you read here should be considered advice or recommendation. Everything is purely and solely for informational purposes.