Collecting pings from software?

Friday 22 January 2016

Let's say I have a piece of software. In this case, it's some automation for installing and upgrading Open edX. I want to know how it is being used, for example, how many people in the last month used certain versions or switches.

To collect information like that, I can put together a URL in the program, and ping that URL. What's a good simple way to collect that information? What server or service is easy to use and can help me look at the data? Is this something I should use classic marketing web analytics for? Is there a more developer-centric service out there?

This is one of those things that seems easy enough to just do with bit.ly, or a dead-stupid web server with access logs, but I'm guessing there are better ways I don't yet know about.

tagged: » 6 reactions

Comments

[gravatar]
Tolomea 3:36 PM on 22 Jan 2016

Maybe statsd into something like hosted graphite. Provides an easy way to increment metrics and then graph what those metrics are doing over time.

[gravatar]
Jonathan Hartley 6:06 PM on 22 Jan 2016

I haven't done this, but I speculate that one way might be to use the AWS 'lambda' functionality, to wire up a hander that gets run in response to a particular URL being hit. The advantage of this is that you don't need a persistent server for this to run, you only get charged for the CPU/IO cycles your handler uses. So it's very cheap and easy to maintain. One implementation might have the handler simply append counts to an S3 bucket. The contents of a bucket can be exposed at a URL, or downloaded from the AWS web gui. If you want a dead simple system, stop there.

But if you want to automate the processing of that data to produce a report of some kind, you could have a second task that runs on a regular basis, reads from the first bucket, processes it, and writes the report to a second bucket.

Perhaps you could stray into using structured data stores instead of buckets if you need something fancier.

[gravatar]
Michael Kohne 2:14 PM on 23 Jan 2016

Do be careful about your users - not everyone is entirely happy about their software reporting home, even for the most mundane of reasons.

[gravatar]
Mark Roddy 6:46 PM on 23 Jan 2016

Most web analytics services have the ability to publish "events" in addition to tracking normal web data like page views. Events usually take the form of arbitrary key value pairs, and can usually be published on a the back end of a web tier in addition to the front end js side. So there's probably an SDK for your favorite language. There's maybe a bit more leg work involved to learn a 3rd party api/client than hitting your own URL via curl, but you're going to make that up and then some on the data collection, ingestion, aggregation, and reporting side as they'll give you all of that out of the box. Here are a few services I've used, though I can't really say how many have a free tier and/or if they fit in your budget/needs:
* Google Analytics
* Kissmetrics
* Heap Analytics
* Mixpanel

[gravatar]
Mark Weiss 12:46 AM on 24 Jan 2016

+1 for statsd over graphite. Very scalable (I use it for systems doing O(100,000) operations per second), and you can do flexible charting. You can partition data any way you like by recording it with namespaced tags that delimit using periods.

[gravatar]
Corey 7:05 AM on 3 Feb 2016

Ali and I were collecting performance data from end users at edX. We instrumented some javascript to calculate various states of DOM loading. Once our timers all fired in a user's browser, a beacon was sent back to us via XHR (simple HTTP GET with query params).

> a dead-stupid web server with access logs

don't ever underestimate the power of dead-stupid! :)

Our infrastructure was exactly as you describe. But the killer feature was forwarding the access logs to a Splunk indexer. Our access log data then became searchable in near real-time. Splunk is an awesome tool for slicing and dicing data like this. With the profiling data beaconed back, we were able to gather a rich set of metrics from real users. We used Splunk to segment and analyze the data very quickly and produce reports.

NGINX + Splunk worked great for this task, and it was trivial to configure.

Add a comment:

name
email
Ignore this:
not displayed and no spam.
Leave this empty:
www
not searched.
 
Name and either email or www are required.
Don't put anything here:
Leave this empty:
URLs auto-link and some tags are allowed: <a><b><i><p><br><pre>.