alecor.net

Search the site:

2026-02-08

LLM and RAG Evaluation: Metrics, Best Practices

This article provides a concise reference for evaluating large language models (LLMs) and retrieval-augmented generation (RAG) systems. It covers core metrics like accuracy, F1, BLEU, ROUGE, and perplexity, and highlights recent advancements in safety, alignment, semantic evaluation, hallucination detection, operational metrics, and multilingual performance. A practical bullet-point workflow is included for both research and production settings.

posted 2026-02-08 · Data Science · Python Data Science AI Products MLOps LLMs RAG Evaluation NLP

2026-02-07

Core Tools in the Modern Python Data Analytics Stack

Modern data and AI products are built on a small set of recurring Python tools for data processing, visualization, interfaces, and APIs. This article provides a concise conceptual overview of Pandas, Polars, visualization libraries, dashboard frameworks, and backend web frameworks—highlighting how they fit together in real-world systems.

posted 2026-02-07 · Data Science · Python Data Science Visualization Dashboards APIs AI Products MLOps