Commit Graph

108 Commits

Author SHA1 Message Date
David Chen
56ae0d9a48 Fixes statsd reports missing strings and SCS.
Reports written to disk don't contain the strings used, which will
make this report unusable if there are strings that don't show up
again. We should always include the strings, so this option is
removed entirely.

Also, we hard-coded the wrong number of fields when pulling
ModemActivityInfo. There are actually 10 fields, not 6.

Bug: 79601503
Test: Tested unit-tests pass on marlin-eng.
Change-Id: I6834b096ced77418a9cc2ddd79b08d1c9c447fae
2018-05-11 17:04:56 -07:00
TreeHugger Robot
3f57b16deb Merge "Skip writing metrics to disk if it's entirely composed of no_report_metric" into pi-dev 2018-05-10 02:02:37 +00:00
yro
028091cb15 Skip writing metrics to disk if it's entirely composed of
no_report_metric

Test: unit test, cts
Bug: 79488249
Change-Id: I3e13a6271cc36665a43d0f09d8663e5996224477
2018-05-09 16:03:27 -07:00
David Chen
9e6dbbdadf Fix statsd returning uidmap with empty reports.
We notice devices uploading a bunch of bytes for the uidmap even if
the device is running an empty config, so there are no actual metrics
to report. This hardcodes some logic to skip the inclusion of the
uidmap if there are exactly 0 metrics.

Bug: 79381210
Test: Tested unit-tests on marlin-eng
Change-Id: I96348235341a7faf15ff57d4d1eccac635a3a999
2018-05-07 18:07:19 -07:00
David Chen
48944901f7 Fixes statsd returning too much data at once.
We observe a single ConfigMetricsReportList can be greater than the
safe size for the binder transaction buffer since we only check the
size of the current metrics in progress, but we also return the
previous reports stored on disk.

This change will attempt to send another ConfigMetricsReportList
as soon as possible if there's already a report on disk.

Also fixes a bug when trying to trigger data fetch before the client
has registered the corresponding dataFetchOperation.

Bug: 79201869
Test: Tested manually on marlin-eng
Change-Id: I2d3677162804a27e7a7a95d482d80c46bd994a67
2018-05-04 17:09:16 -07:00
Yao Chen
a62ae51ba9 Merge "Add cmd to let statsd print all logs it received for debugging." into pi-dev 2018-05-04 20:21:18 +00:00
android-build-team Robot
ec41a069fd Merge "Reset statsd and correctly record the dump reason when system server restarts/crashes." into pi-dev 2018-05-04 02:43:03 +00:00
Yangster-mac
892f3d3229 Reset statsd and correctly record the dump reason when system
server restarts/crashes.

Test: statsd test
BUG: b/79161505
Change-Id: I0646c764964f6eafde91f9ae0179a1c837af320d
2018-05-03 17:05:24 -07:00
Yangster-mac
754e29edd7 Turns DEBUG to false in statsd.
Test: statsd test
BUG: b/79161505
Change-Id: Ic6eee527d625b10aa86b2beb4b4c4fc05b051c7d
2018-05-03 21:19:39 +00:00
Yao Chen
876889cb76 Add cmd to let statsd print all logs it received for debugging.
It only works on eng build. And all code is behind a build flag, so the
code will be stripped out in production builds.

Bug: 78239479
Test: manual
Change-Id: I20ee51822d18e6c77ca324a5327712cbed09593e
2018-05-03 10:53:27 -07:00
Yang Lu
493bb2b119 Merge "Lock the pulling alarm handler." into pi-dev 2018-05-02 18:25:18 +00:00
Yangster-mac
9def8e3995 Reduce statsd log data size.
1. Hash the strings in metric dimensions.
2. Optimize the timestamp encoding in bucket.
   Use bucket num for full bucket and millis for
   partial bucket.
3. Encode the dimension path per metric and avoid
   deduping it across dimensons.

Test: statsd test
Change-Id: I18f69654de85edb21a9c835c73edead756295e05
BUG: b/77813755
2018-04-26 04:30:18 -07:00
Chenjie Yu
e36018b272 add dump report reason to reports
+ also change uidmapping version numbers to int64_t

Bug: 78132855
Change-Id: Iac7ea93e4bf651bd65bd03383e7ab4971af4fc29
Fix: 78132855
Test: gts test
2018-04-18 20:19:21 +00:00
David Chen
d37bc23f50 Adds a code when statsd sends intent to getData.
If the data receiver is experiencing delays, there may be a queue of
multiple intents to collect the same data. This timestamp makes it
easy in the receiver to de-dupe these requests to call getData.

Also, we update how StatsCompanionService gets the snapshot by
requesting data for all known apps. I notice that Keep seems to have
a uid active even when it appears uninstalled.

Bug: 77981668
Test: Flashed marlin-eng and manually verified.
Change-Id: I509e19383ec4a5da8746dd0c76ac71a948c6877d
2018-04-13 17:01:13 -07:00
Yangster
6df5fcc126 Lock the pulling alarm handler.
Test: statsd test

BUG: b/77906846

Change-Id: I414771a20babfb2324e47dd8ddbb44eaa088d199
2018-04-13 09:03:20 -07:00
Yao Chen
163d2602db Handle logd reconnect.
When statsd reconnects to logd, statsd will read all logs from buffer again. To prevent us from
reprocessing old events, we do the following:

1. At any given moment, record the largest timestamp(T_max) and last timestamp (check point) that
   we've seen before.
2. When reconnection happens, we look for the check point until we see a new log with a timestamp
   larger than T_max.
   -> If we found the CP, resume after the CP. Success
   -> If we can't find CP, there is definitely log loss. We reset all configs.

Note:
1. Logd has an API to read logs after a certain timestamp. But this api is vulnerable to
time changes from Settings. So we cannot rely on it.

2. If logd inserts a new log (with older timestamp) before CP, we cannot detect it. It's not
   possible to detect it without record all timestamps we have seen.

Test: statsd_test
Bug: 77813113

Change-Id: Ic3fdb47230807606ab11dc994cb162194adb8448
2018-04-10 22:06:03 -07:00
Yangster-mac
15f6bbc24f Flush the bucket when creating the metric producer.
Use int64 for value field.
E2e test for gauge/value metric.

BUG: b/74445671

Test: statsd test.
Change-Id: I823a0bade8f89834bdfb9cf48864852a47d7b63b
2018-04-10 20:25:13 -07:00
Yangster-mac
e68f3a5811 Flush the partial bucket when startd shuts down or config updated.
Test: statsd test

BUG: b/77556036
Change-Id: Ie4a04ace55e07c4529cdff5906ba874f8815f620
2018-04-05 18:05:57 -07:00
David Chen
203bbbf942 Merge "Fix uid map to be simpler and fix partial bucket." into pi-dev 2018-04-05 23:43:45 +00:00
David Chen
bd12527c90 Fix uid map to be simpler and fix partial bucket.
The previous scheme captured periodic snapshots for each config with
complex logic that's unnecessary and wasted memory. We actually don't
need to store any snapshots since we just convert the current state
into a snapshot and also include the deltas (change events) since the
previous report until now.

To make the system more robust, we also include up to 100 of the
deleted apps in the uid map.

Also, fix the wiring of the partial buckets so the metric producers
form partial buckets on both app upgrade and removal, but not on
installation of a new app.

Also, we update StatsCompanionService to also include disabled apps.

Bug: 77607583
Test: Verified unit-tests pass and added new e2e tests.
Change-Id: I98e1f544d6e6571545ae1581c4cebab807596f51
2018-04-05 16:15:01 -07:00
Yangster-mac
b8382a10a0 Retry logs write when it fails.
Report skipped event in statsd.

Test: manual test
BUG: b/77222120
Change-Id: I257f5e76d557893c4eb4a8e8a13396d8b5d1afc0
2018-04-04 17:53:48 -07:00
Yangster-mac
b142cc8add Statsd config TTL
Roughly check the config every hour to see whether the ttl expired.
If so, read the config from disk and recreate the metric manager.

Test: statsd test

BUG: b/77274363

Change-Id: I16838afe5bbe966c3a0f638869751f9b59a5a259
2018-04-04 15:59:43 +00:00
Yangster-mac
c04feba805 Move forward the alarm timestamp when config is added to statsd.
Test: statsd test
BUG: b/77344187

Change-Id: Ieacffaa29422829b8956f2b3fcb2c647c8c3eed9
2018-04-02 18:12:36 -07:00
TreeHugger Robot
46eef8d049 Merge "E2e test for periodic alarm." into pi-dev 2018-03-31 03:04:59 +00:00
Chenjie Yu
1a0a941c20 Fix StatsCompanionService pull on bucket ends
+ change StatsPullerManager internal time units to be consistent
+ use series of alarms for pullers, instead of use setRepeating

Bug: 76223345
Bug: 75970648
Test: cts test
Change-Id: I9e6ac0ce06541f5ceabd2a8fa444e13d40e36983
2018-03-29 00:11:13 -07:00
Yangster-mac
684d195227 E2e test for periodic alarm.
Test: new test

BUG: b/76281156
Change-Id: I60cb28baaeec6996e946a7cb3358ec8e0aca80e5
2018-03-27 13:26:20 -07:00
Yao Chen
52b478b56a Update Guardrail.
+ Config count is 10 per uid
+ Update the limit for metrics, matchers, conditions, etc.

Test: statsd_test

Bug: 73122377

Change-Id: I3e1adfe318d1354a7c9d1bf484855661aa3a1fc8
2018-03-27 11:29:50 -07:00
David Chen
4c6d97a1e4 Fix statsd dropping metrics data.
We can increase the buffer of metrics we store in statsd memory, but
we still request the clients to call getData when the metrics memory
exceeds 128 KB (previously was 90% of 128 KB).

Bug: 76171061
Test: Test that unit-tests still pass.
Change-Id: I901545b364ed313af8c033ce9b40d3cfadb93213
2018-03-22 16:46:54 -07:00
Howard Ro
e51af37475 Merge "Fix recovery of stats data from previous input while using ProtoOutputStream" into pi-dev 2018-03-21 00:31:23 +00:00
yro
4beccbe3de Fix recovery of stats data from previous input while using
ProtoOutputStream

- Specify the length of message to avoid libprotoutil from thinking that
we are trying to write bool
- We only attach the previous dump file to the upload file where config
key matches
- Store ConfigMetricsReport (instead of ConfigMetricsReportList) onto
disk
- Stop use stack after scope in StorageManager
- Migrate UidMap to use ProtoOutputStream and renaming variables to
prevent confusion

Bug: 74021554
Bug: 75968524
Test: manual test, statsd_test, CTS tests
Change-Id: Iedf52633d7f5b985f5a934a3fb5a0c3c3b2e7fd1
2018-03-20 15:00:59 -07:00
Yao Chen
c40a19d2e4 Add uid field annotation in atoms.proto and statd memory usage optimization.
[memory]
  statsd binary size from 664k -> 600k
  memory usage 1978k -> 1813k (with no configs)
  + Avoid initialize any static map in statslog.h to avoid many copies of the map in each include.
    - Do it in cpp so that it is initialized only in places that use them

[Uid annotation]
+ Uid annotation is needed for extracting uid from dimension for UidCpuPuller.
+ After the change, stand-alone uids don't need to be in field 1 anymore.
+ Also added exclusive bit annotation in AppDied
+ Currently only allow one uid field in an Atom. This is to keep things simple until
  we find an exception.

Test: statsd_test
Bug: 73958484
Bug: 72129300

Change-Id: I8a916d5c00d5930e24ae7e0825a57dea19c0e744
2018-03-16 13:56:38 -07:00
David Chen
f384b90049 Removes stats_log proto from uid map in statsd.
We don't need to parse the proto of uid map, so we use the
ProtoOutputStreame class to generate the binary form of the proto
output that's needed for parsing the uid map data.

Test: Verified unit-tests still pass.
Bug: 74010813
Change-Id: Ia2f7572f3b78bb6f7b60e8b14cf5d65428469ab6
2018-03-15 13:33:04 -07:00
Yangster-mac
3fa5d7fb23 Add wall clock timestamp for ConfigMetricsReport and gauge atoms.
Fix the bug when serializing multiple atoms in gauge metric

BUG: b/74159560

Test: new test for ALL_CONDITION_CHANGES sampling method.
Change-Id: I6d33c1efbac92b6e13be2d64c323e090cb1f84aa
2018-03-10 22:25:28 -08:00
yro
1cf2ac5241 Write data to file when StatsCompanionSerivice (system_server) crashes
Bug: 73352867
Change-Id: Iecbb1ae3e29264975771155a878b368cfc2f50f0
Test: statsd_test
2018-03-07 17:59:13 -08:00
Yi Jin
5ee0787024 Use uint64_t instead of long long as API type for consistent reason.
Bug: 74118023
Test: manual
Change-Id: Icd5f506c76d3a008a79cb6c9d2061962ca7fdd40
2018-03-05 18:18:27 -08:00
Yao Chen
06dba5d79c Add API to let metrics directly drop data without writing to an output.
+ Metrics will do flushIfNeeded() to correctly move the clock and informing
  AnomalyTracker the past bucket info, and then clear past buckets.

+ We will still keep the current bucket data for the validity of the future metrics.

Bug: 70571383
Test: statsd_test
Change-Id: Ib13c45574974e7b4e82bd8f305091dc93bda76f5
2018-03-01 15:22:55 -08:00
TreeHugger Robot
6158952c30 Merge "Avoid reading logs that were processed before." 2018-02-28 19:37:05 +00:00
Yao Chen
8f42ba0e2c Avoid reading logs that were processed before.
This could happen when statsd is disconnected from logd reader. When we reconnect, we are going to
get all events from the buffer again.

Bug: 72379125
Test: manual
Change-Id: Ie0122d5452555500c3bdfc1f905a0b1c646efdf7
2018-02-27 15:17:07 -08:00
TreeHugger Robot
03b91d77c4 Merge "Alarm: wakes up statsd and notifies the subscribers." 2018-02-27 23:08:31 +00:00
Yangster-mac
932ececa16 Alarm: wakes up statsd and notifies the subscribers.
Test: manually tested it.
Change-Id: Id796a68976aeb1611183023ba4e9c6a8b8c44bb8
2018-02-27 13:30:48 -08:00
David Chen
926fc7571a Fixes timebase used when dumping reports.
We should be using elapsed realtime for most timestamps in statsd
so that the times can only increase monotonically.

Test: Test that statsd builds and unit-tests passes.
Change-Id: I0bb23e89aa9a6dbf6d56a0c23eec77bdd053f29b
2018-02-23 14:36:06 -08:00
TreeHugger Robot
6189807c12 Merge "Remove unused variables in statsd, and make more warnings show." 2018-02-14 20:12:18 +00:00
Yangster-mac
330af58f2b Use elapsed realtime instead of times based on wall clock, which can jump around and go backwards.
Test: statsd unit test passed

Change-Id: Ib541df99231e171b3be2a24f75632693e36da90e
2018-02-13 23:30:39 -08:00
Yao Chen
4c959cb99e Remove unused variables in statsd, and make more warnings show.
Test: statsd_test

Change-Id: I2c7b674cb615f22c5de90c2de5f2d58108ab2e7f
2018-02-13 15:31:22 -08:00
yro
aab45c1d09 Remove sending broadcast when StatsLogProcessor is being initialized as
its clients have not started to receive broadcasts

This also fixes broken statsd_test's which happens whenever there are
files in /data/misc/stats-data/ which is generated right before reboots.
This would delay the upload time from right after reboot to next upload
cycle but it should not be an issue.

Bug: 73089712
Test: statsd_test
Change-Id: Ida81099c9c9e54804a0c3b3b349096312ef570bc
2018-02-12 22:20:39 +00:00
Yao Chen
8a8d16ceea Statsd CPU optimization.
The key change is to revamp how we parse/store/match a log event, especially how we match repeated
field and attribution nodes, and how we construct dimensions and compare them.

+ We use a integer to encode the field of a log element. And also encode the FieldMatcher into an
integer and a bit mask. The log matching becomes 2 integer operations.

+ Dimension is stored as encoded field and value pair. Checking if 2 dimensions are equal is then
  becoming checking if the underlying integers are equal. The integers are stored contiguously
  in memory, so it's much faster than previous tree structure.

Start review from FieldValue.h

Test: statsd_test + new unit tests

Bug: 72659059

Change-Id: Iec8daeacdd3f39ab297c10ab9cd7b710a9c42e86
2018-02-12 10:38:45 -08:00
Tej Singh
484524a246 Turn off debug logging in statsd
Sets DEBUG to false everywhere and replaces all ALOGD with VLOG so they
do not print with DEBUG false. Leaves all ALOGI, ALOGW and ALOGE as is.

Test: ran all CTS tests and checked "adb logcat -s statsd" to make sure
it wasn't spammy

Change-Id: Iaa8eb3a0a63723ffe40f94f2815f94df877fd432
2018-02-08 13:11:29 -08:00
David Chen
1604957800 Modifies statsd output for start and end times.
We include the start of when the last dump occurred and the current
timestamp. These timestamps are shared across all metrics, so
there's no advantage in duplicating these numbers across all metrics.

Also, we should use elapsed realtime instead of times based on wall
clock, which can jump around and go backwards.

Test: Test that statsd can still build and
adb shell cmd stats dump-report doesn't crash.
Change-Id: I819e5643cee75dfa3e78a58f94c9d61ededa78d7
2018-02-08 09:59:45 -08:00
Chenjie Yu
fa22d65f14 puller cache clearing
+ add adb command to manually clear puller cache
+ try to clear puller cache every 10s

Test: manual test
Change-Id: I8005cacd189de1880fcaeb030efbe21e6d3c0244
2018-02-06 11:11:46 -08:00
TreeHugger Robot
357b63b172 Merge "Cpu usage optimization: 1/ Avoid unnecessary field/dimension proto construction. 2/ use unordered_map for slicing. 3/ Use dimension fields to compare dimension keys." 2018-01-30 17:37:14 +00:00