SuccessChanges

Summary

  1. Store pipeline state + switch to argparse (commit: b38366beb8d716b10d4e0a14d64a321150969375) (details)
  2. Modify the load script to also load the pipeline state (commit: 06a0a4e2b241847807f43f89331808fc7c11388b) (details)
  3. Read first place with a manual query (commit: c2265604999a1d3f1707c29a8567da01304e638f) (details)
Commit b38366beb8d716b10d4e0a14d64a321150969375 by shankari
Store pipeline state + switch to argparse
The raw data and the analysis results do not constitute the entire state
of a pipeline. In particular, if we store only the raw + analysis
results, and then we try to run the pipeline again, we will end up with
two copies of the analysis results.
Instead, when we transfer data, it should include the raw data, the
pipeline state, and the analysis results.
Change this code to store the pipeline state as well.
And since I am in there changing things anyway, switch to argparse to
handle the arguments as well.
```
$ ./e-mission-py.bash
bin/debug/extract_timeline_for_day_range_and_user.py -e
test_output_gen_curr_ts -- 2010-01-01 2020-01-01 /tmp/test_dump storage
not configured, falling back to sample, default configuration Connecting
to database URL localhost
INFO:root:==================================================
INFO:root:Extracting timeline for user
d4dfcc42-b6fc-4b6b-a246-d1abec1d039f day 2010-01-01 -> 2020-01-01 and
saving to file /tmp/test_dump DEBUG:root:start_day_ts = 1262304000
(2010-01-01T00:00:00+00:00), end_day_ts = 1577836800
(2020-01-01T00:00:00+00:00) DEBUG:root:curr_query = {'user_id':
UUID('d4dfcc42-b6fc-4b6b-a246-d1abec1d039f'), 'data.ts': {'$lte':
1577836800, '$gte': 1262304000}}, sort_key = data.ts
DEBUG:root:orig_ts_db_keys = None, analysis_ts_db_keys = None
DEBUG:root:finished querying values for None DEBUG:root:finished
querying values for None DEBUG:root:curr_query = {'user_id':
UUID('d4dfcc42-b6fc-4b6b-a246-d1abec1d039f'), 'data.start_ts': {'$lte':
1577836800, '$gte': 1262304000}}, sort_key = data.start_ts
DEBUG:root:orig_ts_db_keys = None, analysis_ts_db_keys = None
DEBUG:root:finished querying values for None DEBUG:root:finished
querying values for None DEBUG:root:curr_query = {'user_id':
UUID('d4dfcc42-b6fc-4b6b-a246-d1abec1d039f'), 'data.enter_ts': {'$lte':
1577836800, '$gte': 1262304000}}, sort_key = data.enter_ts
DEBUG:root:orig_ts_db_keys = None, analysis_ts_db_keys = None
DEBUG:root:finished querying values for None DEBUG:root:finished
querying values for None INFO:root:Found 1449 loc entries, 27 trip-like
entries, 19 place-like entries = 1495 total entries INFO:root:timeline
has unique keys = {'stats/server_api_error', 'statemachine/transition',
'analysis/cleaned_stop', 'background/filtered_location',
'segmentation/raw_trip', 'background/location', 'segmentation/raw_stop',
'segmentation/raw_section', 'stats/client_time',
'background/motion_activity', 'analysis/recreated_location',
'segmentation/raw_place', 'analysis/cleaned_trip', 'background/battery',
'analysis/cleaned_section', 'stats/server_api_time',
'analysis/cleaned_place', 'stats/pipeline_time',
'stats/client_nav_event'} INFO:root:Found 6 pipeline states [6, 1, 2, 3,
11, 9]
$ ls -1 /tmp/test_dump_*
/tmp/test_dump_d4dfcc42-b6fc-4b6b-a246-d1abec1d039f.gz
/tmp/test_dump_pipelinestate_d4dfcc42-b6fc-4b6b-a246-d1abec1d039f.gz
```
(commit: b38366beb8d716b10d4e0a14d64a321150969375)
The file was modifiedbin/debug/extract_timeline_for_day_range_and_user.py (diff)
Commit 06a0a4e2b241847807f43f89331808fc7c11388b by shankari
Modify the load script to also load the pipeline state
This is the load change corresponding to
b38366beb8d716b10d4e0a14d64a321150969375
```
$ ./e-mission-py.bash bin/debug/load_multi_timeline_for_range.py
/tmp/test_dump_ storage not configured, falling back to sample, default
configuration Connecting to database URL localhost INFO:root:Loading
file or prefix /tmp/test_dump_ INFO:root:Found 2 matching files for
prefix /tmp/test_dump_ INFO:root:files are
['/tmp/test_dump_d4dfcc42-b6fc-4b6b-a246-d1abec1d039f.gz',
'/tmp/test_dump_pipelinestate_d4dfcc42-b6fc-4b6b-a246-d1abec1d039f.gz']
... ['/tmp/test_dump_d4dfcc42-b6fc-4b6b-a246-d1abec1d039f.gz']
INFO:root:==================================================
INFO:root:Loading data from file
/tmp/test_dump_d4dfcc42-b6fc-4b6b-a246-d1abec1d039f.gz
INFO:root:Analyzing timeline... INFO:root:timeline has 1495 entries
INFO:root:timeline has data from 1 users INFO:root:timeline has the
following unique keys {'segmentation/raw_trip',
'analysis/cleaned_place', 'stats/client_time',
'analysis/recreated_location', 'segmentation/raw_section',
'stats/pipeline_time', 'analysis/cleaned_section',
'background/motion_activity', 'analysis/cleaned_trip',
'analysis/cleaned_stop', 'segmentation/raw_stop',
'segmentation/raw_place', 'stats/server_api_time', 'background/battery',
'background/filtered_location', 'statemachine/transition',
'background/location', 'stats/server_api_error',
'stats/client_nav_event'} INFO:root:timeline for user
d4dfcc42-b6fc-4b6b-a246-d1abec1d039f contains analysis results Loading
pipeline state for d4dfcc42-b6fc-4b6b-a246-d1abec1d039f from
/tmp/test_dump__pipelinestate_d4dfcc42-b6fc-4b6b-a246-d1abec1d039f.gz
INFO:root:Creating user entries for 1 users INFO:root:pattern =
user-%01d INFO:root:For 1 users, loaded 1272 raw entries, 223 processed
entries and 6 pipeline states INFO:root:all entries in the timeline
contain analysis results, no need to run the intake pipeline
```
(commit: 06a0a4e2b241847807f43f89331808fc7c11388b)
The file was modifiedbin/debug/load_multi_timeline_for_range.py (diff)
Commit c2265604999a1d3f1707c29a8567da01304e638f by shankari
Read first place with a manual query
Since the first places have no `enter_ts`, they are not matched by their
regular queries.  Testing done:
https://github.com/e-mission/e-mission-server/pull/562#issuecomment-357591130
(commit: c2265604999a1d3f1707c29a8567da01304e638f)
The file was modifiedbin/debug/extract_timeline_for_day_range_and_user.py (diff)