Commit Graph

205 Commits

Author SHA1 Message Date
TreeHugger Robot
07d7baf36c Merge "statsd events/gauge: remove WallClockTime" 2018-09-12 21:43:03 +00:00
TreeHugger Robot
b299070edf Merge "Remove dimension fields in GaugeMetric output" 2018-09-10 17:47:30 +00:00
Yangster-mac
e124e42582 Interface of writing key value pair atom to socket and parsing from statsd.
Test: statsd unit test
BUG: b/114231161

Change-Id: I3543900934b5e8e0677bf1e7cc454d61064a2475
2018-09-07 11:09:37 -07:00
Bookatz
fe2dde8184 statsd events/gauge: remove WallClockTime
EventMetricData stores wall_clock_timestamp_nanos.
It is expensive, costing 10 bytes per event and evidently not needed.
Similar for GaugeMetricData.

Bug: 113072343
Test: make -j8 statsd_test && adb sync data && adb shell data/nativetest64/statsd_test/statsd_test
Test: run cts-dev -m CtsStatsdHostTestCases
Test: Manually confirm that events/gauges don't have wallclock
Change-Id: Iae978a434354c049e1fa61d42536be981c862b4f
2018-09-05 10:53:33 -07:00
Chenjie Yu
4c31f67965 Remove dimension fields in GaugeMetric output
GaugeMetric output is atom format and can contain all fields including
those already included in dimensions.
This is not necessary and duplicate.
And, if we do set dimensions, we can use this to reduce data size
similar to repeated fields, where same dimension value only appear once.

Bug: 113061955
Test: unit test
Change-Id: I299bd1cb1b9b90ea7426ef182df78d2ffc091910
2018-08-22 14:14:02 -07:00
Yangster-mac
48b3d62bfe Create log event from key value maps.
BUG: b/112816333
Test: statsd test.
Change-Id: Ib66f06186abfacd77807436379e1e142a5b87c99
2018-08-19 22:37:59 +00:00
TreeHugger Robot
a5f28237e4 Merge "allow statsd pull based on event trigger" 2018-08-11 20:09:22 +00:00
Chenjie Yu
8858897205 allow statsd pull based on event trigger
Several restrictions:
1) This is only with GaugeMetric. We don't have use case for ValueMetric
to pull on event trigger.
2) trigger_event is set in the config. It can be generic atom matcher. But
we limit the number of atoms referenced in it to be 1. So we don't allow
multiple atoms to form a complex trigger.
3) This has to go with ALL_CONDITION_CHANGES sampling type.

+ also specify atom id of GaugeMetric output.

Bug: 111937835
Test: unit test
Change-Id: Ia15b1f209945f022edffb9ec5d673317d55d9e4f
2018-08-10 20:49:52 -07:00
TreeHugger Robot
632b39288d Merge "Remove the obsolete code for logd and add statsd socket log loss detection." 2018-08-11 00:30:47 +00:00
Yangster
8a34384797 Return unknown for combination condition eval when operation is NOT and
there is no child.

Test: added unit test and rerun the statsd tests.

BUG: b/112311529

Change-Id: I0c5829e3cb26474b7dbcc05f20c4311e9f801d97
2018-08-10 04:30:11 +00:00
Yao Chen
3ff3a490e4 Remove the obsolete code for logd and add statsd socket log loss detection.
+ Remove dead code
+ Add a simple log loss detection as a starter to see if there is any log loss
  detected at all.

TODO: If we do see log loss, we can add more sophisticated logging and reset mechanism.

Bug: 80538532
Test: statsd_test
Change-Id: Iff150c9d8f9f936dbd4586161a3484bef7035c28
2018-08-06 16:24:49 -07:00
Chenjie Yu
e1361ed422 Adjust 1st bucket start time
adjust 1st bucket start time for a partial bucket
also make valuemetric and gauge metric pull on first bucket

Bug: 111607838
Bug: 111660710
Bug: 111842941

Test: unit test
Change-Id: I5932c2258f8deac57e7abbf26f3214f87914a964
2018-07-27 10:53:38 -07:00
Chenjie Yu
a0f0224906 ValueMetric supports multiple aggregation types
1. Add support for MIN, MAX, AVG
2. ValueMetric also allow floats now, in addition to long data type.
AnomalyDetection still takes long only. I am not sure if it makes
sense to do anomaly on AVG. I will leave that for later.
3. ValueMetric supports sliced condition change for pushed events.
I don't think it makes sense for pulled events to have sliced condition
changes so leave it for now.

Test: unit test
Change-Id: I8bc510d98ea9b8a6eb16d04ff99dce6b574249cd
2018-07-13 10:24:41 -07:00
Yao Chen
5bfffb54da Clean up TODOs in statsd
+ Created bugs for those TODOs that are still relevant.
+ Remove obsolete TODOs.

Test: no code change.
Change-Id: I41c2a89a882f087817ee6cbc3f095e1d80e1928e
2018-06-25 11:08:04 -07:00
Chenjie Yu
e22192071d StatsPullerManager not use singleton
This is to be consistent with other patterns such as UidMap.
This also makes unit test simpler.

Change-Id: I1558cd609e470481f269ecf2ae616277a95cfbf0
Bug: 72722120
Test: unit test
2018-06-14 15:46:54 -07:00
Bookatz
d27ab45ad3 Remove TODO in statsd AnomalyTracker_test
The underlying item the TODO is referencing had already been resolved
so the test line can be properly added, per the TODO.

Change-Id: I5c16e7ea319bd16e37475381def656b38f39d17f
Fixes: 80095149
Test: make statsd_test && adb sync data && adb shell data/nativetest64/statsd_test/statsd_test
2018-05-24 10:35:02 -07:00
Yangster-mac
1c58f04cd3 Add a field in config to disable/enable the string hashing in metric report.
Statsd hashes (using its own hashing function) raw strings to reduce the
upload data size when there are duplicate strings in the report. And in cloud,
the clearcut translator would backfill the strings.

In a few droidfood users, we find the translator was unable to do that. While
debugging the root cause, we first decided to provide an option to disable
the hashing from the cloud.

Test: statsd unit test, CTS test, tested manually

BUG: b/79943763
Change-Id: If0359c8cf3f3cf83a2938db9ebf95ea7906f0b0c
2018-05-18 10:39:50 -07:00
Chenjie Yu
021e25307d ValueMetric pushed events should check condition
+ fix unit test flakiness

Bug: 79873404
Change-Id: I15b52a79b18c05603640781e4450e7b62fac24ba
Fix: 79873404
Test: unit test
2018-05-16 14:50:11 -07:00
David Chen
092a5a9b85 Fixes Value metrics in statsd and app upgrades.
Pulled value metrics with conditions had a subtle bug that caused
us to leave the condition on even if it should've been false.

Bug: 79778783
Test: Added unit-test and verified on marlin-eng.
Change-Id: I31f34791118319b3471f7a6ea8a024e2d511cfe7
2018-05-15 17:51:47 -07:00
David Chen
56ae0d9a48 Fixes statsd reports missing strings and SCS.
Reports written to disk don't contain the strings used, which will
make this report unusable if there are strings that don't show up
again. We should always include the strings, so this option is
removed entirely.

Also, we hard-coded the wrong number of fields when pulling
ModemActivityInfo. There are actually 10 fields, not 6.

Bug: 79601503
Test: Tested unit-tests pass on marlin-eng.
Change-Id: I6834b096ced77418a9cc2ddd79b08d1c9c447fae
2018-05-11 17:04:56 -07:00
TreeHugger Robot
a159842161 Merge "Fix the flaky gauge/value e2e test due to cached events." into pi-dev 2018-05-10 01:02:10 +00:00
android-build-team Robot
c3d0798455 Merge "Fix partial bucket unit tests." into pi-dev 2018-05-09 18:07:24 +00:00
Yangster-mac
58e609e339 Fix the flaky gauge/value e2e test due to cached events.
Test: statsd test
BUG: b/79265262
Change-Id: I4d67f1c2edb6215a3cea23f8c7b2e8d5099c4aac
2018-05-08 16:19:48 -07:00
David Chen
9e6dbbdadf Fix statsd returning uidmap with empty reports.
We notice devices uploading a bunch of bytes for the uidmap even if
the device is running an empty config, so there are no actual metrics
to report. This hardcodes some logic to skip the inclusion of the
uidmap if there are exactly 0 metrics.

Bug: 79381210
Test: Tested unit-tests on marlin-eng
Change-Id: I96348235341a7faf15ff57d4d1eccac635a3a999
2018-05-07 18:07:19 -07:00
Yao Chen
cc884dfc94 Fix partial bucket unit tests.
Bug: 79347749
Test: statsd_test
Change-Id: I69eee7172d6fe4ce895530f089193eb08653e269
2018-05-07 10:34:31 -07:00
David Chen
48944901f7 Fixes statsd returning too much data at once.
We observe a single ConfigMetricsReportList can be greater than the
safe size for the binder transaction buffer since we only check the
size of the current metrics in progress, but we also return the
previous reports stored on disk.

This change will attempt to send another ConfigMetricsReportList
as soon as possible if there's already a report on disk.

Also fixes a bug when trying to trigger data fetch before the client
has registered the corresponding dataFetchOperation.

Bug: 79201869
Test: Tested manually on marlin-eng
Change-Id: I2d3677162804a27e7a7a95d482d80c46bd994a67
2018-05-04 17:09:16 -07:00
Yangster-mac
892f3d3229 Reset statsd and correctly record the dump reason when system
server restarts/crashes.

Test: statsd test
BUG: b/79161505
Change-Id: I0646c764964f6eafde91f9ae0179a1c837af320d
2018-05-03 17:05:24 -07:00
Yangster-mac
9def8e3995 Reduce statsd log data size.
1. Hash the strings in metric dimensions.
2. Optimize the timestamp encoding in bucket.
   Use bucket num for full bucket and millis for
   partial bucket.
3. Encode the dimension path per metric and avoid
   deduping it across dimensons.

Test: statsd test
Change-Id: I18f69654de85edb21a9c835c73edead756295e05
BUG: b/77813755
2018-04-26 04:30:18 -07:00
Chenjie Yu
e36018b272 add dump report reason to reports
+ also change uidmapping version numbers to int64_t

Bug: 78132855
Change-Id: Iac7ea93e4bf651bd65bd03383e7ab4971af4fc29
Fix: 78132855
Test: gts test
2018-04-18 20:19:21 +00:00
David Chen
81245fd53a Adds option to drop small buckets for statsd.
We notice that some of the pulled metrics have a ton of data, and
during app upgrades, we're forming partial buckets that represent
small periods of time but require many bytes of data. We now have an
option to drop these buckets that are too short. Note that we still
have to pull the data to keep the metrics for the next bucket
correct. We include a new field in the value and gauge metric outputs
so that it's easy to tell when a bucket was dropped.

We drop the partial buckets also from anomaly detection since we
should be computing anomalies from the same data that is reported.

Test: Added unit-tests for value and gauge metrics.
Bug: 77925710
Change-Id: Ic370496377c6afd380e02278a6c1ed8b521a2731
2018-04-16 18:42:14 -07:00
Jeff Sharkey
6b64925737 Protect usage data with OP_GET_USAGE_STATS.
APIs that return package usage data (such as the new StatsManager)
must ensure that callers hold both the PACKAGE_USAGE_STATS permission
and the OP_GET_USAGE_STATS app-op.

Add noteOp() method that can be called from native code.

Also add missing security checks on command interface.

Bug: 77662908, 78121728
Test: builds, boots
Change-Id: Ie0d51e4baaacd9d7d36ba0c587ec91a870b9df17
2018-04-16 12:44:32 -06:00
TreeHugger Robot
6b317915e8 Merge "StatsManager throws exceptions" into pi-dev 2018-04-11 17:02:06 +00:00
Yao Chen
163d2602db Handle logd reconnect.
When statsd reconnects to logd, statsd will read all logs from buffer again. To prevent us from
reprocessing old events, we do the following:

1. At any given moment, record the largest timestamp(T_max) and last timestamp (check point) that
   we've seen before.
2. When reconnection happens, we look for the check point until we see a new log with a timestamp
   larger than T_max.
   -> If we found the CP, resume after the CP. Success
   -> If we can't find CP, there is definitely log loss. We reset all configs.

Note:
1. Logd has an API to read logs after a certain timestamp. But this api is vulnerable to
time changes from Settings. So we cannot rely on it.

2. If logd inserts a new log (with older timestamp) before CP, we cannot detect it. It's not
   possible to detect it without record all timestamps we have seen.

Test: statsd_test
Bug: 77813113

Change-Id: Ic3fdb47230807606ab11dc994cb162194adb8448
2018-04-10 22:06:03 -07:00
Yangster-mac
15f6bbc24f Flush the bucket when creating the metric producer.
Use int64 for value field.
E2e test for gauge/value metric.

BUG: b/74445671

Test: statsd test.
Change-Id: I823a0bade8f89834bdfb9cf48864852a47d7b63b
2018-04-10 20:25:13 -07:00
Bookatz
4f71629002 StatsManager throws exceptions
When StatsManager fails to connect to statsd, it now throws an exception
for the caller to catch. It also throws an exception of the config being
added is of an unreadable format.

Due to backwards compatibility issues, the old APIs could not be
changed, so new ones were made to replace the old ones. The old ones are
now temporary and will be removed when the compatibility issue is
resolved.

Bug: 77648233
Test: gts-tradefed run gts-dev --module GtsStatsdHostTestCases
Change-Id: Ibea05883a29b9b3ef9927d2f8fe295eb99832ab7
2018-04-10 19:07:32 -07:00
Chenjie Yu
ae63b0af94 Drop value if the bucket is totally tainted
Bug: 77870358
Change-Id: Ia96970a3254de08f94b91ad53be2fdb9f4db7eb4
Fix: 77870358
Test: unit test
2018-04-10 14:59:31 -07:00
Yangster-mac
e68f3a5811 Flush the partial bucket when startd shuts down or config updated.
Test: statsd test

BUG: b/77556036
Change-Id: Ie4a04ace55e07c4529cdff5906ba874f8815f620
2018-04-05 18:05:57 -07:00
David Chen
bd12527c90 Fix uid map to be simpler and fix partial bucket.
The previous scheme captured periodic snapshots for each config with
complex logic that's unnecessary and wasted memory. We actually don't
need to store any snapshots since we just convert the current state
into a snapshot and also include the deltas (change events) since the
previous report until now.

To make the system more robust, we also include up to 100 of the
deleted apps in the uid map.

Also, fix the wiring of the partial buckets so the metric producers
form partial buckets on both app upgrade and removal, but not on
installation of a new app.

Also, we update StatsCompanionService to also include disabled apps.

Bug: 77607583
Test: Verified unit-tests pass and added new e2e tests.
Change-Id: I98e1f544d6e6571545ae1581c4cebab807596f51
2018-04-05 16:15:01 -07:00
Yangster-mac
b142cc8add Statsd config TTL
Roughly check the config every hour to see whether the ttl expired.
If so, read the config from disk and recreate the metric manager.

Test: statsd test

BUG: b/77274363

Change-Id: I16838afe5bbe966c3a0f638869751f9b59a5a259
2018-04-04 15:59:43 +00:00
David Chen
faa1af535b Includes annotations with statsd reports.
It's tricky to determine the source of the metrics on a device
currently since we can take the union of multiple configs and send
only one giant statsd_config into statsd. We will use the int64 field
to track the sub config id's and the int32 field to track the version
for each sub config, but the fields are named more generically as
annotations.

The annotations are available in both the reports and metadata.

Test: Check that all unit-tests pass on marlin-eng
Bug: 77327261
Change-Id: Ic37c549c8b2991676f69948c515156765c9f5108
2018-04-03 18:20:40 -07:00
Yangster-mac
c04feba805 Move forward the alarm timestamp when config is added to statsd.
Test: statsd test
BUG: b/77344187

Change-Id: Ieacffaa29422829b8956f2b3fcb2c647c8c3eed9
2018-04-02 18:12:36 -07:00
TreeHugger Robot
46eef8d049 Merge "E2e test for periodic alarm." into pi-dev 2018-03-31 03:04:59 +00:00
TreeHugger Robot
3b826c6075 Merge "Statsd MAX now obeys refractory periods too" into pi-dev 2018-03-31 01:10:57 +00:00
Bookatz
0dbc7a4343 Statsd MAX now obeys refractory periods too
The logic of where refractory period enforcement was moved out of the
anomaly tracker and into the metric's predictAnomalyTimestamp. It was
done for ORING, but not for MAX. This fixes MAX.

Bug: 74446029
Test: adb sync data && adb shell data/nativetest64/statsd_test/statsd_test
Change-Id: I51e43c7c132f424af8fe20a37f2ad10cc55b5989
2018-03-30 16:21:17 -07:00
TreeHugger Robot
9fef2594f3 Merge "Clean up atoms.proto" into pi-dev 2018-03-30 20:48:09 +00:00
Chenjie Yu
5caaa9d854 Clean up atoms.proto
changes are:
1) for pushed atoms, use attribution node in place of uid when
appropriate
2) name changes to be more consistent

Bug: 73823969
Test: manual test
Change-Id: Iacf7186dbd7a2282f7fe481f43dbbf92e1165b47
2018-03-30 10:11:03 -07:00
Chenjie Yu
6d370f40fe Add unit test ValueMetricProducer on boundary
Mostly to add test to assure the corner cases are covered.
One minor logic change is if two true conditions happen, in the case
when following happen:
(bucket boundary1) -> (condition false) -> (condition true) -> (pull
triggered for the boundary1)
Previously we take the latest. Now we skip the late boundary pull.

Bug: 76384731
Test: unit test
Change-Id: I345c2210a58bf03eb91d65742573073d2668358b
2018-03-29 21:54:54 -07:00
Chenjie Yu
1a0a941c20 Fix StatsCompanionService pull on bucket ends
+ change StatsPullerManager internal time units to be consistent
+ use series of alarms for pullers, instead of use setRepeating

Bug: 76223345
Bug: 75970648
Test: cts test
Change-Id: I9e6ac0ce06541f5ceabd2a8fa444e13d40e36983
2018-03-29 00:11:13 -07:00
Yangster-mac
684d195227 E2e test for periodic alarm.
Test: new test

BUG: b/76281156
Change-Id: I60cb28baaeec6996e946a7cb3358ec8e0aca80e5
2018-03-27 13:26:20 -07:00
TreeHugger Robot
d9afdee26f Merge "Support sliced condition change in GaugeMetric" into pi-dev 2018-03-27 18:27:27 +00:00