Commit Graph

94 Commits

Author SHA1 Message Date
Yangster
6df5fcc126 Lock the pulling alarm handler.
Test: statsd test

BUG: b/77906846

Change-Id: I414771a20babfb2324e47dd8ddbb44eaa088d199
2018-04-13 09:03:20 -07:00
Yao Chen
163d2602db Handle logd reconnect.
When statsd reconnects to logd, statsd will read all logs from buffer again. To prevent us from
reprocessing old events, we do the following:

1. At any given moment, record the largest timestamp(T_max) and last timestamp (check point) that
   we've seen before.
2. When reconnection happens, we look for the check point until we see a new log with a timestamp
   larger than T_max.
   -> If we found the CP, resume after the CP. Success
   -> If we can't find CP, there is definitely log loss. We reset all configs.

Note:
1. Logd has an API to read logs after a certain timestamp. But this api is vulnerable to
time changes from Settings. So we cannot rely on it.

2. If logd inserts a new log (with older timestamp) before CP, we cannot detect it. It's not
   possible to detect it without record all timestamps we have seen.

Test: statsd_test
Bug: 77813113

Change-Id: Ic3fdb47230807606ab11dc994cb162194adb8448
2018-04-10 22:06:03 -07:00
Yangster-mac
15f6bbc24f Flush the bucket when creating the metric producer.
Use int64 for value field.
E2e test for gauge/value metric.

BUG: b/74445671

Test: statsd test.
Change-Id: I823a0bade8f89834bdfb9cf48864852a47d7b63b
2018-04-10 20:25:13 -07:00
Yangster-mac
e68f3a5811 Flush the partial bucket when startd shuts down or config updated.
Test: statsd test

BUG: b/77556036
Change-Id: Ie4a04ace55e07c4529cdff5906ba874f8815f620
2018-04-05 18:05:57 -07:00
David Chen
203bbbf942 Merge "Fix uid map to be simpler and fix partial bucket." into pi-dev 2018-04-05 23:43:45 +00:00
David Chen
bd12527c90 Fix uid map to be simpler and fix partial bucket.
The previous scheme captured periodic snapshots for each config with
complex logic that's unnecessary and wasted memory. We actually don't
need to store any snapshots since we just convert the current state
into a snapshot and also include the deltas (change events) since the
previous report until now.

To make the system more robust, we also include up to 100 of the
deleted apps in the uid map.

Also, fix the wiring of the partial buckets so the metric producers
form partial buckets on both app upgrade and removal, but not on
installation of a new app.

Also, we update StatsCompanionService to also include disabled apps.

Bug: 77607583
Test: Verified unit-tests pass and added new e2e tests.
Change-Id: I98e1f544d6e6571545ae1581c4cebab807596f51
2018-04-05 16:15:01 -07:00
Yangster-mac
b8382a10a0 Retry logs write when it fails.
Report skipped event in statsd.

Test: manual test
BUG: b/77222120
Change-Id: I257f5e76d557893c4eb4a8e8a13396d8b5d1afc0
2018-04-04 17:53:48 -07:00
Yangster-mac
b142cc8add Statsd config TTL
Roughly check the config every hour to see whether the ttl expired.
If so, read the config from disk and recreate the metric manager.

Test: statsd test

BUG: b/77274363

Change-Id: I16838afe5bbe966c3a0f638869751f9b59a5a259
2018-04-04 15:59:43 +00:00
Yangster-mac
c04feba805 Move forward the alarm timestamp when config is added to statsd.
Test: statsd test
BUG: b/77344187

Change-Id: Ieacffaa29422829b8956f2b3fcb2c647c8c3eed9
2018-04-02 18:12:36 -07:00
TreeHugger Robot
46eef8d049 Merge "E2e test for periodic alarm." into pi-dev 2018-03-31 03:04:59 +00:00
Chenjie Yu
1a0a941c20 Fix StatsCompanionService pull on bucket ends
+ change StatsPullerManager internal time units to be consistent
+ use series of alarms for pullers, instead of use setRepeating

Bug: 76223345
Bug: 75970648
Test: cts test
Change-Id: I9e6ac0ce06541f5ceabd2a8fa444e13d40e36983
2018-03-29 00:11:13 -07:00
Yangster-mac
684d195227 E2e test for periodic alarm.
Test: new test

BUG: b/76281156
Change-Id: I60cb28baaeec6996e946a7cb3358ec8e0aca80e5
2018-03-27 13:26:20 -07:00
Yao Chen
52b478b56a Update Guardrail.
+ Config count is 10 per uid
+ Update the limit for metrics, matchers, conditions, etc.

Test: statsd_test

Bug: 73122377

Change-Id: I3e1adfe318d1354a7c9d1bf484855661aa3a1fc8
2018-03-27 11:29:50 -07:00
David Chen
4c6d97a1e4 Fix statsd dropping metrics data.
We can increase the buffer of metrics we store in statsd memory, but
we still request the clients to call getData when the metrics memory
exceeds 128 KB (previously was 90% of 128 KB).

Bug: 76171061
Test: Test that unit-tests still pass.
Change-Id: I901545b364ed313af8c033ce9b40d3cfadb93213
2018-03-22 16:46:54 -07:00
Howard Ro
e51af37475 Merge "Fix recovery of stats data from previous input while using ProtoOutputStream" into pi-dev 2018-03-21 00:31:23 +00:00
yro
4beccbe3de Fix recovery of stats data from previous input while using
ProtoOutputStream

- Specify the length of message to avoid libprotoutil from thinking that
we are trying to write bool
- We only attach the previous dump file to the upload file where config
key matches
- Store ConfigMetricsReport (instead of ConfigMetricsReportList) onto
disk
- Stop use stack after scope in StorageManager
- Migrate UidMap to use ProtoOutputStream and renaming variables to
prevent confusion

Bug: 74021554
Bug: 75968524
Test: manual test, statsd_test, CTS tests
Change-Id: Iedf52633d7f5b985f5a934a3fb5a0c3c3b2e7fd1
2018-03-20 15:00:59 -07:00
Yao Chen
c40a19d2e4 Add uid field annotation in atoms.proto and statd memory usage optimization.
[memory]
  statsd binary size from 664k -> 600k
  memory usage 1978k -> 1813k (with no configs)
  + Avoid initialize any static map in statslog.h to avoid many copies of the map in each include.
    - Do it in cpp so that it is initialized only in places that use them

[Uid annotation]
+ Uid annotation is needed for extracting uid from dimension for UidCpuPuller.
+ After the change, stand-alone uids don't need to be in field 1 anymore.
+ Also added exclusive bit annotation in AppDied
+ Currently only allow one uid field in an Atom. This is to keep things simple until
  we find an exception.

Test: statsd_test
Bug: 73958484
Bug: 72129300

Change-Id: I8a916d5c00d5930e24ae7e0825a57dea19c0e744
2018-03-16 13:56:38 -07:00
David Chen
f384b90049 Removes stats_log proto from uid map in statsd.
We don't need to parse the proto of uid map, so we use the
ProtoOutputStreame class to generate the binary form of the proto
output that's needed for parsing the uid map data.

Test: Verified unit-tests still pass.
Bug: 74010813
Change-Id: Ia2f7572f3b78bb6f7b60e8b14cf5d65428469ab6
2018-03-15 13:33:04 -07:00
Yangster-mac
3fa5d7fb23 Add wall clock timestamp for ConfigMetricsReport and gauge atoms.
Fix the bug when serializing multiple atoms in gauge metric

BUG: b/74159560

Test: new test for ALL_CONDITION_CHANGES sampling method.
Change-Id: I6d33c1efbac92b6e13be2d64c323e090cb1f84aa
2018-03-10 22:25:28 -08:00
yro
1cf2ac5241 Write data to file when StatsCompanionSerivice (system_server) crashes
Bug: 73352867
Change-Id: Iecbb1ae3e29264975771155a878b368cfc2f50f0
Test: statsd_test
2018-03-07 17:59:13 -08:00
Yi Jin
5ee0787024 Use uint64_t instead of long long as API type for consistent reason.
Bug: 74118023
Test: manual
Change-Id: Icd5f506c76d3a008a79cb6c9d2061962ca7fdd40
2018-03-05 18:18:27 -08:00
Yao Chen
06dba5d79c Add API to let metrics directly drop data without writing to an output.
+ Metrics will do flushIfNeeded() to correctly move the clock and informing
  AnomalyTracker the past bucket info, and then clear past buckets.

+ We will still keep the current bucket data for the validity of the future metrics.

Bug: 70571383
Test: statsd_test
Change-Id: Ib13c45574974e7b4e82bd8f305091dc93bda76f5
2018-03-01 15:22:55 -08:00
TreeHugger Robot
6158952c30 Merge "Avoid reading logs that were processed before." 2018-02-28 19:37:05 +00:00
Yao Chen
8f42ba0e2c Avoid reading logs that were processed before.
This could happen when statsd is disconnected from logd reader. When we reconnect, we are going to
get all events from the buffer again.

Bug: 72379125
Test: manual
Change-Id: Ie0122d5452555500c3bdfc1f905a0b1c646efdf7
2018-02-27 15:17:07 -08:00
TreeHugger Robot
03b91d77c4 Merge "Alarm: wakes up statsd and notifies the subscribers." 2018-02-27 23:08:31 +00:00
Yangster-mac
932ececa16 Alarm: wakes up statsd and notifies the subscribers.
Test: manually tested it.
Change-Id: Id796a68976aeb1611183023ba4e9c6a8b8c44bb8
2018-02-27 13:30:48 -08:00
David Chen
926fc7571a Fixes timebase used when dumping reports.
We should be using elapsed realtime for most timestamps in statsd
so that the times can only increase monotonically.

Test: Test that statsd builds and unit-tests passes.
Change-Id: I0bb23e89aa9a6dbf6d56a0c23eec77bdd053f29b
2018-02-23 14:36:06 -08:00
TreeHugger Robot
6189807c12 Merge "Remove unused variables in statsd, and make more warnings show." 2018-02-14 20:12:18 +00:00
Yangster-mac
330af58f2b Use elapsed realtime instead of times based on wall clock, which can jump around and go backwards.
Test: statsd unit test passed

Change-Id: Ib541df99231e171b3be2a24f75632693e36da90e
2018-02-13 23:30:39 -08:00
Yao Chen
4c959cb99e Remove unused variables in statsd, and make more warnings show.
Test: statsd_test

Change-Id: I2c7b674cb615f22c5de90c2de5f2d58108ab2e7f
2018-02-13 15:31:22 -08:00
yro
aab45c1d09 Remove sending broadcast when StatsLogProcessor is being initialized as
its clients have not started to receive broadcasts

This also fixes broken statsd_test's which happens whenever there are
files in /data/misc/stats-data/ which is generated right before reboots.
This would delay the upload time from right after reboot to next upload
cycle but it should not be an issue.

Bug: 73089712
Test: statsd_test
Change-Id: Ida81099c9c9e54804a0c3b3b349096312ef570bc
2018-02-12 22:20:39 +00:00
Yao Chen
8a8d16ceea Statsd CPU optimization.
The key change is to revamp how we parse/store/match a log event, especially how we match repeated
field and attribution nodes, and how we construct dimensions and compare them.

+ We use a integer to encode the field of a log element. And also encode the FieldMatcher into an
integer and a bit mask. The log matching becomes 2 integer operations.

+ Dimension is stored as encoded field and value pair. Checking if 2 dimensions are equal is then
  becoming checking if the underlying integers are equal. The integers are stored contiguously
  in memory, so it's much faster than previous tree structure.

Start review from FieldValue.h

Test: statsd_test + new unit tests

Bug: 72659059

Change-Id: Iec8daeacdd3f39ab297c10ab9cd7b710a9c42e86
2018-02-12 10:38:45 -08:00
Tej Singh
484524a246 Turn off debug logging in statsd
Sets DEBUG to false everywhere and replaces all ALOGD with VLOG so they
do not print with DEBUG false. Leaves all ALOGI, ALOGW and ALOGE as is.

Test: ran all CTS tests and checked "adb logcat -s statsd" to make sure
it wasn't spammy

Change-Id: Iaa8eb3a0a63723ffe40f94f2815f94df877fd432
2018-02-08 13:11:29 -08:00
David Chen
1604957800 Modifies statsd output for start and end times.
We include the start of when the last dump occurred and the current
timestamp. These timestamps are shared across all metrics, so
there's no advantage in duplicating these numbers across all metrics.

Also, we should use elapsed realtime instead of times based on wall
clock, which can jump around and go backwards.

Test: Test that statsd can still build and
adb shell cmd stats dump-report doesn't crash.
Change-Id: I819e5643cee75dfa3e78a58f94c9d61ededa78d7
2018-02-08 09:59:45 -08:00
Chenjie Yu
fa22d65f14 puller cache clearing
+ add adb command to manually clear puller cache
+ try to clear puller cache every 10s

Test: manual test
Change-Id: I8005cacd189de1880fcaeb030efbe21e6d3c0244
2018-02-06 11:11:46 -08:00
TreeHugger Robot
357b63b172 Merge "Cpu usage optimization: 1/ Avoid unnecessary field/dimension proto construction. 2/ use unordered_map for slicing. 3/ Use dimension fields to compare dimension keys." 2018-01-30 17:37:14 +00:00
Yangster-mac
7ba8fc357e Cpu usage optimization:
1/ Avoid unnecessary field/dimension proto construction.
2/ use unordered_map for slicing.
3/ Use dimension fields to compare dimension keys.

Test: all statsd tests passed.
Change-Id: I2f74f78589b7f6ecd0803a2ead822b8d0399f334
2018-01-26 23:17:02 +00:00
Yao Chen
884c8c130f Add more statsd's debugging info to dumpsys.
+ Bugreport will use the non-verbose mode
+ Reuse the log_msg object in LogReader
+ Add logd errors to StatsdStats

Bug: 72383073

Test: manual + statsd_test

Change-Id: Id5a8b103074d034f5ece3c9831c740d44a5df9cd
2018-01-26 12:03:58 -08:00
TreeHugger Robot
d0c260ff41 Merge "Adding guardrails on writing to disk from statsd" 2018-01-25 06:47:29 +00:00
TreeHugger Robot
82c2173b67 Merge "Statsd always includes snapshot of uid map." 2018-01-24 22:31:24 +00:00
TreeHugger Robot
3f9a1a5426 Merge "Fix deadlock for write-disk cmd." 2018-01-24 19:51:22 +00:00
yro
98a28501fe Adding guardrails on writing to disk from statsd
- Limit total number of files to 1000
- Limit total size of files to 5MB
- Remove idle files to be deleted after 30 days

Bug: 69854160
Test: manual testing, statsd, statsd_test
Change-Id: I33148a3b7ca11d413ec2495d5c0659f1ba4485c3
2018-01-24 10:33:54 -08:00
David Chen
cfc311d2f0 Statsd always includes snapshot of uid map.
Statsd will contain at least one snapshot of the uid map. The
previous design was not very robust in case a snapshot was missing.

Also fixes subtle bug with updating the isolated uid mapping since
this should always be kept up to date even if there are no metrics
being used (since metrics may be added later after the isolated uid
was created).

Test: Checked that unit-tests pass on marlin-eng.
Change-Id: I99754ed9016d980564e409b0946a46b398fd12b7
2018-01-23 18:04:03 -08:00
Yangster-mac
8617950962 Fix deadlock for write-disk cmd.
Test: manual tested.
Change-Id: I6c1e1f10bbb3830c932b3d7b57df8d4960c13977
2018-01-23 15:51:17 -08:00
Yangster-mac
68985805f2 Avoid processing log event when there is no uid field.
Test: all statsd unit test passed

Change-Id: Id434d86586950a485b30a244f3c030e8202c1c6d
2018-01-23 16:43:07 +00:00
Yangster-mac
8282d5b8bc Avoid processing the log event when there is no config.
Test: statsd unit test passed
Change-Id: If9840283accdeaa36d956213a1a9fec44204e77d
2018-01-19 10:10:04 -08:00
yro
079cea9a7e Create /data/misc/stats-data/ and /data/misc/stats-service/ in statsd.rc
rather than during the runtime of statsd

The purpose of this change is to prevent causing selinux violation by
trying to mkdir to /data/misc/ directory when statsd doesn't have
permission to do so.

Bug: 71537285
Test: manually tested to make sure that there's no sepolicy violation

Change-Id: I9c4ccecc416f41923c9b24dd44a388d135fecc07
2018-01-12 13:51:48 -08:00
Yangster-mac
d40053eb8b Map isolated uid to host uid when processing log event in statsD.
Test: added test case for isolated uid in Attribution e2e test.
Change-Id: I63d16ebee3e611b1ef0c910e5154cf27766cb330
2018-01-09 21:45:46 -08:00
Yangster-mac
b0d0628a29 Thread-safety at log processor level.
Test: statsd unit test passed.

Change-Id: Ibe8c8d3cc8297875b16ee385c077b71c87353147
2018-01-08 14:59:42 -08:00
Yao Chen
147ce60278 use only string type in the log source whitelist.
+ predefined "AID_X" will be provided as string type to statsd, and we will translate
  to integer uid using the static map.

Test: statsd_test

Change-Id: Ie47d8481e0c456457e6881ebb9cb4ce008e772b8
2018-01-04 09:57:03 -08:00