C++

Streaming Percentiles 3.1.0 Released
Analytics c++ ckms gk javascript percentiles streaming
Published: 2019-03-29
Streaming Percentiles 3.1.0 Released

Version 3.1.0 of the streaming percentiles library has been released.

The streaming percentiles library is a cross-platform, multi-language (C++ and JavaScript) implementation of a number of online (single-pass) percentile algorithms. This version of the streaming percentiles library adds support for copy construction, assignment, move construction, and move assignment on all analytics classes.

This change allows you to put streaming analytics classes into STL containers, such as:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
#include <vector>
#include <stmpct/gk.hpp>

std::vector<stmpct::gk<double>> algs;
algs.emplace_back(0.01);
algs.emplace_back(0.05);
for (auto& g : algs) {
    for (int i = 0; i < 1000; ++i) {
        g.insert(rand());
    }

    double p95 = g.quantile(0.95); // Approx. 95th percentile for current epsilon
}

It also allows you to efficiently move the state of an algorithm from one object to another without copying, as in:

Read more...
Calculating Percentiles on Streaming Data Part 8: Parameterizing Algorithms on Measurement Type
Calculating Percentiles on Streaming Data c++ ckms gk javascript percentiles streaming
Published: 2018-12-21
Calculating Percentiles on Streaming Data Part 8: Parameterizing Algorithms on Measurement Type

This is part 8/8 of my Calculating Percentiles on Streaming Data series.

As mentioned in part 6 of this series, I have published a C++ and JavaScript library which implements a number of streaming percentile algorithms on GitHub at https://github.com/sengelha/streaming-percentiles.

Versions 1.x and 2.x of the C++ library required all measurements to use the type double, and usage of the algorithms looked something like this:

1
2
3
4
5
6
7
8
#include <stmpct/gk.hpp>

double epsilon = 0.1;
stmpct::gk g(epsilon);
for (int i = 0; i < 1000; ++i)
    g.insert(rand());
double p50 = g.quantile(0.5); // Approx. median
double p95 = g.quantile(0.95); // Approx. 95th percentile

For version 3.x of the library, I have modified the algorithms to be parameterized by type so that they are not limited to double. The algorithms now may be used by type which implements a few simple requirements: it must be default constructible, copy constructible, assignable, and implement operator<. This means that they can now be used with float, int, long double, a custom number type, or even std::string.

Read more...
Calculating Percentiles on Streaming Data
Programming c++
Published: 2018-03-29
Calculating Percentiles on Streaming Data

This blog post series shows my exploration with calculating percentiles on data using only a single pass. It showcases a few different streaming percentiles algorithms and ends with a C++ and JavaScript library which implements a few of these algorithms.

Calculating Percentiles on Streaming Data Part 7: Cormode-Korn-Muthukrishnan-Srivastava
Calculating Percentiles on Streaming Data c++ ckms gk javascript percentiles streaming
Published: 2018-03-29
Calculating Percentiles on Streaming Data Part 7: Cormode-Korn-Muthukrishnan-Srivastava

This is part 7/8 of my Calculating Percentiles on Streaming Data series.

In 2005, Graham Cormode, Flip Korn, S. Muthukrishnan, and Divesh Srivastava published a paper called Effective Computation of Biased Quantiles over Data Streams [CKMS05]. This paper took the Greenwald-Khanna algorithm [GK01] and made the following notable changes:

  1. Generalized the algorithm to work with arbitrary targeted quantiles, where a targeted quantile is a combination of a quantile $\phi$ and a maximum allowable error $\epsilon$. The algorithm will maintain the summary data structure $\mathsf{S}$ such that it will maintain each targeted quantile with the appropriate error rate.
  2. Simplified the Compress() operation by removing the concept of banding.

The paper also demonstrated that the new, more generalized algorithm could handle the following cases with ease:

Read more...
Escaping Strings in XPath 1.0
XML / XPath / XSLT c++ xpath
Published: 2008-06-03
Escaping Strings in XPath 1.0

XPath is a language for selecting nodes from an XML document. XPath is used extensively in XSLT and other XML technologies. I also vastly prefer using XPath (e.g. with XPathNavigator) over the XML DOM when manipulating XML in a non-streaming fashion.

In XPath, strings must be delimited by either single or double quotes. Given a quote character used to delimit a string, one can’t represent that same quote character within the string. This means that if you decide to use single quotes to delimit your XPath string, you couldn’t represent the string O’Reilly; use double quotes, and you can’t represent “Hello”.

Read more...
STL Objects and Win32 Module Boundaries
C++ c++ stl win32
Published: 2008-01-04
STL Objects and Win32 Module Boundaries

Let’s say you have the following function:

1
2
3
4
void AppendChar(std::string& s, char ch)
{
    s += ch;
}

What happens if this function is exported as an ordinal function from a DLL (not an inlined piece of code inside a header) and you call it from an EXE?

It works most of the time. When it doesn’t, it corrupts your heap and causes a spectacular mess.

Read more...