Commit Graph

74 Commits

Author SHA1 Message Date
Yi Jin
5ee0787024 Use uint64_t instead of long long as API type for consistent reason.
Bug: 74118023
Test: manual
Change-Id: Icd5f506c76d3a008a79cb6c9d2061962ca7fdd40
2018-03-05 18:18:27 -08:00
Yao Chen
06dba5d79c Add API to let metrics directly drop data without writing to an output.
+ Metrics will do flushIfNeeded() to correctly move the clock and informing
  AnomalyTracker the past bucket info, and then clear past buckets.

+ We will still keep the current bucket data for the validity of the future metrics.

Bug: 70571383
Test: statsd_test
Change-Id: Ib13c45574974e7b4e82bd8f305091dc93bda76f5
2018-03-01 15:22:55 -08:00
TreeHugger Robot
6158952c30 Merge "Avoid reading logs that were processed before." 2018-02-28 19:37:05 +00:00
Yao Chen
8f42ba0e2c Avoid reading logs that were processed before.
This could happen when statsd is disconnected from logd reader. When we reconnect, we are going to
get all events from the buffer again.

Bug: 72379125
Test: manual
Change-Id: Ie0122d5452555500c3bdfc1f905a0b1c646efdf7
2018-02-27 15:17:07 -08:00
TreeHugger Robot
03b91d77c4 Merge "Alarm: wakes up statsd and notifies the subscribers." 2018-02-27 23:08:31 +00:00
Yangster-mac
932ececa16 Alarm: wakes up statsd and notifies the subscribers.
Test: manually tested it.
Change-Id: Id796a68976aeb1611183023ba4e9c6a8b8c44bb8
2018-02-27 13:30:48 -08:00
David Chen
926fc7571a Fixes timebase used when dumping reports.
We should be using elapsed realtime for most timestamps in statsd
so that the times can only increase monotonically.

Test: Test that statsd builds and unit-tests passes.
Change-Id: I0bb23e89aa9a6dbf6d56a0c23eec77bdd053f29b
2018-02-23 14:36:06 -08:00
TreeHugger Robot
6189807c12 Merge "Remove unused variables in statsd, and make more warnings show." 2018-02-14 20:12:18 +00:00
Yangster-mac
330af58f2b Use elapsed realtime instead of times based on wall clock, which can jump around and go backwards.
Test: statsd unit test passed

Change-Id: Ib541df99231e171b3be2a24f75632693e36da90e
2018-02-13 23:30:39 -08:00
Yao Chen
4c959cb99e Remove unused variables in statsd, and make more warnings show.
Test: statsd_test

Change-Id: I2c7b674cb615f22c5de90c2de5f2d58108ab2e7f
2018-02-13 15:31:22 -08:00
yro
aab45c1d09 Remove sending broadcast when StatsLogProcessor is being initialized as
its clients have not started to receive broadcasts

This also fixes broken statsd_test's which happens whenever there are
files in /data/misc/stats-data/ which is generated right before reboots.
This would delay the upload time from right after reboot to next upload
cycle but it should not be an issue.

Bug: 73089712
Test: statsd_test
Change-Id: Ida81099c9c9e54804a0c3b3b349096312ef570bc
2018-02-12 22:20:39 +00:00
Yao Chen
8a8d16ceea Statsd CPU optimization.
The key change is to revamp how we parse/store/match a log event, especially how we match repeated
field and attribution nodes, and how we construct dimensions and compare them.

+ We use a integer to encode the field of a log element. And also encode the FieldMatcher into an
integer and a bit mask. The log matching becomes 2 integer operations.

+ Dimension is stored as encoded field and value pair. Checking if 2 dimensions are equal is then
  becoming checking if the underlying integers are equal. The integers are stored contiguously
  in memory, so it's much faster than previous tree structure.

Start review from FieldValue.h

Test: statsd_test + new unit tests

Bug: 72659059

Change-Id: Iec8daeacdd3f39ab297c10ab9cd7b710a9c42e86
2018-02-12 10:38:45 -08:00
Tej Singh
484524a246 Turn off debug logging in statsd
Sets DEBUG to false everywhere and replaces all ALOGD with VLOG so they
do not print with DEBUG false. Leaves all ALOGI, ALOGW and ALOGE as is.

Test: ran all CTS tests and checked "adb logcat -s statsd" to make sure
it wasn't spammy

Change-Id: Iaa8eb3a0a63723ffe40f94f2815f94df877fd432
2018-02-08 13:11:29 -08:00
David Chen
1604957800 Modifies statsd output for start and end times.
We include the start of when the last dump occurred and the current
timestamp. These timestamps are shared across all metrics, so
there's no advantage in duplicating these numbers across all metrics.

Also, we should use elapsed realtime instead of times based on wall
clock, which can jump around and go backwards.

Test: Test that statsd can still build and
adb shell cmd stats dump-report doesn't crash.
Change-Id: I819e5643cee75dfa3e78a58f94c9d61ededa78d7
2018-02-08 09:59:45 -08:00
Chenjie Yu
fa22d65f14 puller cache clearing
+ add adb command to manually clear puller cache
+ try to clear puller cache every 10s

Test: manual test
Change-Id: I8005cacd189de1880fcaeb030efbe21e6d3c0244
2018-02-06 11:11:46 -08:00
TreeHugger Robot
357b63b172 Merge "Cpu usage optimization: 1/ Avoid unnecessary field/dimension proto construction. 2/ use unordered_map for slicing. 3/ Use dimension fields to compare dimension keys." 2018-01-30 17:37:14 +00:00
Yangster-mac
7ba8fc357e Cpu usage optimization:
1/ Avoid unnecessary field/dimension proto construction.
2/ use unordered_map for slicing.
3/ Use dimension fields to compare dimension keys.

Test: all statsd tests passed.
Change-Id: I2f74f78589b7f6ecd0803a2ead822b8d0399f334
2018-01-26 23:17:02 +00:00
Yao Chen
884c8c130f Add more statsd's debugging info to dumpsys.
+ Bugreport will use the non-verbose mode
+ Reuse the log_msg object in LogReader
+ Add logd errors to StatsdStats

Bug: 72383073

Test: manual + statsd_test

Change-Id: Id5a8b103074d034f5ece3c9831c740d44a5df9cd
2018-01-26 12:03:58 -08:00
TreeHugger Robot
d0c260ff41 Merge "Adding guardrails on writing to disk from statsd" 2018-01-25 06:47:29 +00:00
TreeHugger Robot
82c2173b67 Merge "Statsd always includes snapshot of uid map." 2018-01-24 22:31:24 +00:00
TreeHugger Robot
3f9a1a5426 Merge "Fix deadlock for write-disk cmd." 2018-01-24 19:51:22 +00:00
yro
98a28501fe Adding guardrails on writing to disk from statsd
- Limit total number of files to 1000
- Limit total size of files to 5MB
- Remove idle files to be deleted after 30 days

Bug: 69854160
Test: manual testing, statsd, statsd_test
Change-Id: I33148a3b7ca11d413ec2495d5c0659f1ba4485c3
2018-01-24 10:33:54 -08:00
David Chen
cfc311d2f0 Statsd always includes snapshot of uid map.
Statsd will contain at least one snapshot of the uid map. The
previous design was not very robust in case a snapshot was missing.

Also fixes subtle bug with updating the isolated uid mapping since
this should always be kept up to date even if there are no metrics
being used (since metrics may be added later after the isolated uid
was created).

Test: Checked that unit-tests pass on marlin-eng.
Change-Id: I99754ed9016d980564e409b0946a46b398fd12b7
2018-01-23 18:04:03 -08:00
Yangster-mac
8617950962 Fix deadlock for write-disk cmd.
Test: manual tested.
Change-Id: I6c1e1f10bbb3830c932b3d7b57df8d4960c13977
2018-01-23 15:51:17 -08:00
Yangster-mac
68985805f2 Avoid processing log event when there is no uid field.
Test: all statsd unit test passed

Change-Id: Id434d86586950a485b30a244f3c030e8202c1c6d
2018-01-23 16:43:07 +00:00
Yangster-mac
8282d5b8bc Avoid processing the log event when there is no config.
Test: statsd unit test passed
Change-Id: If9840283accdeaa36d956213a1a9fec44204e77d
2018-01-19 10:10:04 -08:00
yro
079cea9a7e Create /data/misc/stats-data/ and /data/misc/stats-service/ in statsd.rc
rather than during the runtime of statsd

The purpose of this change is to prevent causing selinux violation by
trying to mkdir to /data/misc/ directory when statsd doesn't have
permission to do so.

Bug: 71537285
Test: manually tested to make sure that there's no sepolicy violation

Change-Id: I9c4ccecc416f41923c9b24dd44a388d135fecc07
2018-01-12 13:51:48 -08:00
Yangster-mac
d40053eb8b Map isolated uid to host uid when processing log event in statsD.
Test: added test case for isolated uid in Attribution e2e test.
Change-Id: I63d16ebee3e611b1ef0c910e5154cf27766cb330
2018-01-09 21:45:46 -08:00
Yangster-mac
b0d0628a29 Thread-safety at log processor level.
Test: statsd unit test passed.

Change-Id: Ibe8c8d3cc8297875b16ee385c077b71c87353147
2018-01-08 14:59:42 -08:00
Yao Chen
147ce60278 use only string type in the log source whitelist.
+ predefined "AID_X" will be provided as string type to statsd, and we will translate
  to integer uid using the static map.

Test: statsd_test

Change-Id: Ie47d8481e0c456457e6881ebb9cb4ce008e772b8
2018-01-04 09:57:03 -08:00
Yangster-mac
94e197cceb 1/ Change all "name" to id in statsD.
2/ Handle Subscription for alert.
3/ Support no_report_metric

Bug: 69522276
Test: all statsd unit tests passed.
Change-Id: I851b235f2d149b8602b0cad632d5bf541962f40a
2018-01-03 15:34:00 -08:00
Yangster-mac
2087716f2b 1/ Support nested message and repeated fields in statsd.
2/ Filter gauge fields by FieldMatcher.
3/ Wire up wakelock attribution chain.
4/ e2e test: wakelock duration metric with aggregated predicate dimensions.
5/ e2e test: count metric with multiple metric condition links for 2 predicates and 1 non-sliced predicate.

Test: statsd unit test passed.

Change-Id: I89db31cb068184a54e0a892fad710966d3127bc9
2018-01-01 10:01:36 -08:00
Bookatz
857aaa5208 Splits AnomalyTracker into two files
Splits out DurationAnomalyTracker-specific functions into their own
subclass.

Test: the unit tests and CTS tests
Change-Id: Id6eb74d232b4a9c3a932d805d1ba3f0ba43a88b1
2017-12-22 15:05:31 -08:00
Yao Chen
d10f7b1c7b Add log source filtering in statsd to filter out spams.
+ Add log source whitelist in StatsdConfig
+ Some changes in UidMap API. Listener needs to be wp instead of sp.
+ Update dogfood app config to have log source
+ Increase the stats service thread pool size to 10 (9+1).

TODO: add unit tests(b/70805664). This unit test takes some time to write.

Test: statsd_test & manual

Change-Id: I129b1cc13db5114db7417580962bd7cc4438519d
2017-12-20 18:45:43 -08:00
Chenjie Yu
85ed838713 align metrics to 5min bundary
We use one alarm clock for all pulled atoms.
If metrics from different configs are not aligned,
the clock will be set to repeat at higher and higher
frequency, and consume a lot of battery.
Current implementation assumes a 5min minimum bucket
size. New metric start time is set to be aligned to
the start time of statsd in the next 5min.
So it will ignore events up to 5min.

align puller alarm to minute bundary

Test: unit test
Change-Id: I77ffa3c13de363c780b1000181b9a9b780dd0846
2017-12-16 17:08:46 -08:00
TreeHugger Robot
2d48f13c7d Merge "Only create ProtoOutputStream when onGetData() is called." 2017-12-13 23:16:59 +00:00
Yao Chen
288c600013 Only create ProtoOutputStream when onGetData() is called.
The exception is EventMetricProducer. Each EventMetricProducer will still have a ProtoOutputStream
Because LogEvent comes as a fixed 4K, it's more memory efficient to have an 8k ProtoOutputStream for
storing the events.

Also removed finish() api in MetricProducer, which was intended to use with Dropbox.

Test: statsd_test & dogfood app
Bug: 70393808
Change-Id: I2efe4ecc76a88060a9aa5eb49d1fa6ea60bc5da8
2017-12-12 14:20:32 -08:00
yro
03faf09330 Migrate disk directory from /data/system/ to /data/misc/
Test: statsd, statsd_test
Change-Id: I6d2fe97afd79fb9b36d180d5e6e6a7a166a228b7
2017-12-12 00:17:50 -08:00
David Chen
d9269e2ee7 Adds rate limit to checking byte size.
Since there is a separate guardrail for memory used by uid map, we
no longer add the memory from uid map with the memory per each
config's metrics. We also prevent the byte size check from happening
too frequently. In order to mock the MetricsManager, we refactor
some of the existing methods.

Test: Added unit-tests and verified they all pass on marlin.
Change-Id: I15cf105f7d95f4016fdb0443b0a33eebe862cafb
2017-12-07 18:22:58 -08:00
Yao Chen
312e898325 Let the event flow to MetricsManager. Comment out the size check only.
Test: statsd_test, and manual
Change-Id: I862967510eaf4d402471e9dd6b9c85f6037dd7e1
2017-12-05 15:29:03 -08:00
Yao Chen
0b73ccad8f Urgent fix. Once UidMap size exceeds the limit, statsd triggers data drop every time a log comes in
Test: manual
Change-Id: Idf93e5aca19b80acf964670fa4bc9f1f0781df1f
2017-12-05 09:37:06 -08:00
David Chen
12942956bc Tiny fix to bug when statsd should clear data.
Previously, we always sent a broadcast, even after we have exceeded
the memory limit for this config key. We switch the order so that
we drop the data if the limit is exceeded. If greater than 90% of the
way to the limit, we send the broadcast.

We need to find a way to unit-test this behavior.

Test: N/A.
Change-Id: I6ea40b9e34dceb19805d9af24495d72878f787e0
2017-12-04 14:28:43 -08:00
David Chen
c136f45aee Adds guardrail for memory usage for statsd uid map.
Checks if current memory usage of uid map is above a configured limit
and if so, we start deleting snapshots. If there are no more
snapshots, we begin deleting two of the deltas. Also records stats
in the guardrail StatsdStats. Also fixes an edge case where a config
is added after the snapshots are added. We request a snapshot of all
installed uid's at that moment. Finally, adds the uid map memory size
when determining if we should send a broadcast to trigger collection.

Test: Added unit-tests and check they pass on marlin.
Change-Id: Id5d86378bd1efe12a06b409164c777c0c6f4e3ab
2017-11-29 14:55:15 -08:00
TreeHugger Robot
3e585ecb51 Merge "Some fixes in StatsdStats, and add some unit tests" 2017-11-28 06:58:54 +00:00
Yao Chen
69f1baf7dd Some fixes in StatsdStats, and add some unit tests
+ Add timestamp for when metric data is reported.

Test: statsd_test

Change-Id: Ief5ec5172feed4ec74b7422b77cf69ec8361ef2f
2017-11-27 20:45:16 -08:00
Bookatz
cc5adef2d0 Statsd anomaly detection - fixes
Fixes a few items in AnomalyTracker, especially to do with what happens
when an anomaly alarm fires.

Test: unit tests still pass
Change-Id: Ia89bd617442e952e587336b890c3ca67430b5e21
2017-11-27 15:35:40 -08:00
Yao Chen
b356151e63 Add StatsdStats and guardrail.
+ StatsdStats is the global class that tracks the stats about statsd.

+ Added guardrail for classes that have a map which could potentially grow
  unboundedly with the number of logs.

TODO: add unit tests & CTS for StatsdStats, and guardrail
      add stats for pulled atoms.

Test: statsd_test

Change-Id: I0ea562de4dd3f6162f7923a9c193420b482c1d51
2017-11-27 10:52:54 -08:00
yro
947fbce521 Captures metrics on disk when devices reboot and shutdown. Specifically,
1. Create intent receiver in StatsCompanionService to listen to shutdown
events.
2. Create StatsWriter class to handle disk writes and deleting files.
3. Update StatsLogProcessor, ConfigManager, and StatsService to handle
files on disk using StatsWriter.
4. Add a wrapper for ConfigMetricsReport.

Still TODO is to be able to add a guardrail to prevent accumulating
excessive amount files on disk, which will be followed up by another
change.

Test: statsd, statsd_test
Change-Id: Ia0b3af315af545daa8b0078b3700c600aa7c285f
2017-11-22 18:39:23 -08:00
Yangster
7c334a129e Make member function as const whenever possible.
Test: unit tests passed.
Change-Id: I751cabf305a4b5aa2095853cc951837da0df4c78
2017-11-22 14:28:00 -08:00
Yangster-mac
e2cd6d509b 1/ Duration anomaly tracker with alarm.
2/ Init anomaly from config based on the public language.
3/ Unit tests for anomaly detection in count/gauge producer.
4/ Revisit the duration tracker logic.

Test: unit test passed.
Change-Id: I2423c0e0f05b1e37626954de9e749303423963f2
2017-11-20 15:37:24 -08:00