RUN: 201904191310 25 8ef45 200 24 Jan 2018 12:23:34 342 522 542 124 123 452 RUN: 201904191310 25 8ef45 300 24 Jan 2018 12:24:54 423 252 452 241 231 542
It needs to be converted to CSV like this:
calib_run,temp_set,unit_id,targconc,time,adc_cond,adc_temp,count 201904191310,25,8ef45,200,24 Jan 2018 12:23:34,335.67,366.0,3 201904191310,25,8ef45,300,24 Jan 2018 12:24:54,368.67,345.0,3
I.e. group by the ID fields and take the mean and count of the observation fields. It could be done easily enough in vanilla Python, but it seemed like a nice simple case to experiment with one of those automatic parser generators. Without much research I picked Lark. From its docs. I came up with the following definition:
start: block+
block: "RUN:" calib_run temp_set unit_id targconc time obs+
calib_run: NUMBER
temp_set: NUMBER
targconc: NUMBER
unit_id: CHARS
time: DATE
obs: adc_cond adc_temp
adc_cond: NUMBER
adc_temp: NUMBER
%import common.NUMBER
%import common.WS
%ignore WS
CHARS: /\\S+/
DATE: NUMBER WS+ MNTH WS+ NUMBER WS+ CHARS
MNTH: ("Jan"|"Feb"|"Mar"|"Apr"|"May"|"Jun"|"Jul"|"Aug"|"Sep"|"Oct"|"Nov"|"Dec")
My "Domain Specific Language" (DSL) starts with the constant sentinel string "RUN:", and then a set of ID fields and then some number of observations that are pairs of numbers. This is Extended Backus-Naur form (EBNF), a way of formally describing language structure, similar to what you see in the Python docs. and other places.
When you run it, you get a tree of nodes, like this:
start
block
calib_run 201904191310
temp_set 25
unit_id 8ef45
targconc 200
time 24 Jan 2018 12:23:34
obs
adc_cond 342
adc_temp 522
obs
adc_cond 542
adc_temp 124
block
calib_run 201904191310
temp_set 25
unit_id 8ef45
Code looks something like this:
from lark import Lark, Token parser = Lark(grammar) pt = parser.parse(text) print(pt.pretty())where
grammar and text are the EBNF and raw
Arduino text shown above.
Here's some code (parse_text is the entry point) which uses EBNF, Lark, and Pandas, to convert the text to a list of lists, essentially the parsed form of the CSV output we want:
import pandas as pd
from lark import Lark, Token
flds = ['calib_run', 'temp_set', 'unit_id', 'targconc', 'time',
'adc_cond', 'adc_temp']
def proc_block(node, callback, state=None):
if state is None:
state = dict(__res=[])
if isinstance(node.children[0], Token):
state[node.data] = str(node.children[0])
callback(state, node)
else:
for child in node.children:
proc_block(child, callback, state)
return state
def callback(state, node):
if node.data == 'adc_temp':
state['__res'].append([state[i] for i in flds])
def parse_text(text, grammar):
parser = Lark(grammar)
pt = parser.parse(text)
res = None
num = [i for i in flds if 'adc_' in i]
grp = list(set(flds) - set(num))
for block in pt.children:
df = proc_block(block, callback)['__res']
df = pd.DataFrame(df, columns=flds)
for fld in num:
df[fld] = df[fld].astype(float)
counts = df.groupby(grp).count()
means = df.groupby(grp).mean().round(2).reset_index()
means['count'] = counts['adc_cond'].tolist()
res = means if res is None else res.append(means)
res = res[flds+['count']] # reorder to put ID fields first again
return [res.columns.tolist()] + res.values.tolist()
So that's pretty much mission accomplished as far as converting the input to CSV goes.
To make it easy for the target audience to use, I decided to wrap it in a web app. using
Flexx. Nothing complicated there - just a text area, a label, and a button. The label tells you to paste you text into the text area, the button converts from the raw form to CSV. These datasets are small enough to handle by copy / pasting. To make it work as a docker container I have this:
if __name__ == '__main__':
a = flx.App(Main)
a.serve()
flx.create_server(host="0.0.0.0", port=8000)
flx.start()
in my Flexx code, so it listens on all interfaces, not just 127.0.0.1, and on a predictable port, 8000. It will be the only thing in the Docker container, so we know 8000 is available. The Dockerfile looks like this:
FROM continuumio/miniconda3 RUN conda install -c conda-forge lark-parser pandas flexx RUN mkdir /.webruntime \ && chmod a+rwx /.webruntime COPY log2csv.py log2csv_ui.py / CMD ["python", "log2csv_ui.py"]and works as expected. A quick test with an SSH tunnel to the docker container on the remote host, everything looks good, ready to deploy. I deploy these containers with Apache proxies to an address like
http://example.com/log2csv/. I put the proxy in place, but argh, it only proxies the http:// requests, not the ws:// requests.
Here's the Apache config. that fixed that:
LoadModule proxy_module modules/mod_proxy.so
LoadModule proxy_http_module modules/mod_proxy_http.so
LoadModule proxy_html_module modules/mod_proxy_html.so
LoadModule proxy_wstunnel_module modules/mod_proxy_wstunnel.so
LoadModule rewrite_module modules/mod_rewrite.so
ProxyPass /log2csv/ http://log2csv:8000/
ProxyHTMLURLMap http://log2csv:8000/ /log2csv/
RewriteEngine on
RewriteCond %{HTTP:Upgrade} websocket [NC]
RewriteCond %{HTTP:Connection} upgrade [NC]
RewriteRule .* "ws://log2csv:8000%{REQUEST_URI}" [P]
<location log2csv="">
ProxyPassReverse /
ProxyHTMLURLMap / /log2csv/
</location>
Note that the Apache server's also running in a (different) docker container
linked to the Flexx web app. container with --link log2csv, so
the IP address for the Flexx web app. container is just log2csv
from within the Apache server's docker container. In a different context you might use
127.0.0.1.
There are probably some hidden rough edges in the above Apache config. Using
proxy_wstunnel_module is supposed to be sufficient by itself, as it allows you to write ProxyPass /log2csv/ ws://log2csv:8000/. But that didn't work, I think because the ws:// request was missing the /log2csv/ subpath. I think it was missing because the ProxyHTMLURLMap didn't fix the ws:// request in the JavaScript(?) that generated it. So instead it's correctly routed by the RewriteRule. But that targets all ws:// requests, so something else would be needed if there was another websocket app. on the server.
No comments:
Post a Comment