RUN: 201904191310 25 8ef45 200 24 Jan 2018 12:23:34 342 522 542 124 123 452 RUN: 201904191310 25 8ef45 300 24 Jan 2018 12:24:54 423 252 452 241 231 542
It needs to be converted to CSV like this:
calib_run,temp_set,unit_id,targconc,time,adc_cond,adc_temp,count 201904191310,25,8ef45,200,24 Jan 2018 12:23:34,335.67,366.0,3 201904191310,25,8ef45,300,24 Jan 2018 12:24:54,368.67,345.0,3
I.e. group by the ID fields and take the mean and count of the observation fields. It could be done easily enough in vanilla Python, but it seemed like a nice simple case to experiment with one of those automatic parser generators. Without much research I picked Lark. From its docs. I came up with the following definition:
start: block+ block: "RUN:" calib_run temp_set unit_id targconc time obs+ calib_run: NUMBER temp_set: NUMBER targconc: NUMBER unit_id: CHARS time: DATE obs: adc_cond adc_temp adc_cond: NUMBER adc_temp: NUMBER %import common.NUMBER %import common.WS %ignore WS CHARS: /\\S+/ DATE: NUMBER WS+ MNTH WS+ NUMBER WS+ CHARS MNTH: ("Jan"|"Feb"|"Mar"|"Apr"|"May"|"Jun"|"Jul"|"Aug"|"Sep"|"Oct"|"Nov"|"Dec")
My "Domain Specific Language" (DSL) starts with the constant sentinel string "RUN:", and then a set of ID fields and then some number of observations that are pairs of numbers. This is Extended Backus-Naur form (EBNF), a way of formally describing language structure, similar to what you see in the Python docs. and other places.
When you run it, you get a tree of nodes, like this:
start block calib_run 201904191310 temp_set 25 unit_id 8ef45 targconc 200 time 24 Jan 2018 12:23:34 obs adc_cond 342 adc_temp 522 obs adc_cond 542 adc_temp 124 block calib_run 201904191310 temp_set 25 unit_id 8ef45Code looks something like this:
from lark import Lark, Token parser = Lark(grammar) pt = parser.parse(text) print(pt.pretty())where
grammar
and text
are the EBNF and raw
Arduino text shown above.
Here's some code (parse_text
is the entry point) which uses EBNF, Lark, and Pandas, to convert the text to a list of lists, essentially the parsed form of the CSV output we want:
import pandas as pd from lark import Lark, Token flds = ['calib_run', 'temp_set', 'unit_id', 'targconc', 'time', 'adc_cond', 'adc_temp'] def proc_block(node, callback, state=None): if state is None: state = dict(__res=[]) if isinstance(node.children[0], Token): state[node.data] = str(node.children[0]) callback(state, node) else: for child in node.children: proc_block(child, callback, state) return state def callback(state, node): if node.data == 'adc_temp': state['__res'].append([state[i] for i in flds]) def parse_text(text, grammar): parser = Lark(grammar) pt = parser.parse(text) res = None num = [i for i in flds if 'adc_' in i] grp = list(set(flds) - set(num)) for block in pt.children: df = proc_block(block, callback)['__res'] df = pd.DataFrame(df, columns=flds) for fld in num: df[fld] = df[fld].astype(float) counts = df.groupby(grp).count() means = df.groupby(grp).mean().round(2).reset_index() means['count'] = counts['adc_cond'].tolist() res = means if res is None else res.append(means) res = res[flds+['count']] # reorder to put ID fields first again return [res.columns.tolist()] + res.values.tolist()So that's pretty much mission accomplished as far as converting the input to CSV goes. To make it easy for the target audience to use, I decided to wrap it in a web app. using Flexx. Nothing complicated there - just a text area, a label, and a button. The label tells you to paste you text into the text area, the button converts from the raw form to CSV. These datasets are small enough to handle by copy / pasting. To make it work as a docker container I have this:
if __name__ == '__main__': a = flx.App(Main) a.serve() flx.create_server(host="0.0.0.0", port=8000) flx.start()in my Flexx code, so it listens on all interfaces, not just 127.0.0.1, and on a predictable port, 8000. It will be the only thing in the Docker container, so we know 8000 is available. The Dockerfile looks like this:
FROM continuumio/miniconda3 RUN conda install -c conda-forge lark-parser pandas flexx RUN mkdir /.webruntime \ && chmod a+rwx /.webruntime COPY log2csv.py log2csv_ui.py / CMD ["python", "log2csv_ui.py"]and works as expected. A quick test with an SSH tunnel to the docker container on the remote host, everything looks good, ready to deploy. I deploy these containers with Apache proxies to an address like
http://example.com/log2csv/
. I put the proxy in place, but argh, it only proxies the http://
requests, not the ws://
requests.
Here's the Apache config. that fixed that:
LoadModule proxy_module modules/mod_proxy.so LoadModule proxy_http_module modules/mod_proxy_http.so LoadModule proxy_html_module modules/mod_proxy_html.so LoadModule proxy_wstunnel_module modules/mod_proxy_wstunnel.so LoadModule rewrite_module modules/mod_rewrite.so ProxyPass /log2csv/ http://log2csv:8000/ ProxyHTMLURLMap http://log2csv:8000/ /log2csv/ RewriteEngine on RewriteCond %{HTTP:Upgrade} websocket [NC] RewriteCond %{HTTP:Connection} upgrade [NC] RewriteRule .* "ws://log2csv:8000%{REQUEST_URI}" [P] <location log2csv=""> ProxyPassReverse / ProxyHTMLURLMap / /log2csv/ </location>Note that the Apache server's also running in a (different) docker container linked to the Flexx web app. container with
--link log2csv
, so
the IP address for the Flexx web app. container is just log2csv
from within the Apache server's docker container. In a different context you might use
127.0.0.1.
There are probably some hidden rough edges in the above Apache config. Using
proxy_wstunnel_module
is supposed to be sufficient by itself, as it allows you to write ProxyPass /log2csv/ ws://log2csv:8000/
. But that didn't work, I think because the ws://
request was missing the /log2csv/
subpath. I think it was missing because the ProxyHTMLURLMap
didn't fix the ws://
request in the JavaScript(?) that generated it. So instead it's correctly routed by the RewriteRule
. But that targets all ws://
requests, so something else would be needed if there was another websocket app. on the server.