Skip to content
This repository was archived by the owner on Dec 30, 2024. It is now read-only.

Commit ca6c058

Browse files
committed
Update to version v1.3.0
1 parent 1fecbd3 commit ca6c058

File tree

227 files changed

+79156
-13087
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

227 files changed

+79156
-13087
lines changed

CHANGELOG.md

Lines changed: 41 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,48 @@
11
# Change Log
22

3-
All notable changes to this project will be documented in this file.
3+
All notable changes to this project will be documented in this file.
44

5-
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6-
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
5+
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6+
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8-
## [1.2.0] - 2020-10-29
9-
### Added
10-
- New and simplified interactive Amazon QuickSight dashboard that is now automatically generated through an AWS CloudFormation deployment and that customers can extend to suit their business case
8+
## [1.3.0] - 2020-11-24
119

12-
### Updated
13-
- Updated to AWS CDK v1.69.0
14-
- Consolidate Amazon S3 access Log bucket across the solution. All access log files have a prefix that corresponds to the bucket for which they are generated
10+
### Changed
1511

16-
## [1.1.0] - 2020-09-29
17-
### Updated
18-
- S3 storage for inference outputs to use Apache Parquet
19-
- Add partitioning to AWS Glue tables
20-
- Update to AWS CDK v1.63.0
21-
- Update to AWS SDK v2.755.0
12+
- Implementation to refactor to reuse the following architecture patterns from [AWS Solutions Constructs](https://aws.amazon.com/solutions/constructs/)
13+
- aws-kinesisfirehose-s3
14+
- aws-kinesisstreams-lambda
15+
- aws-lambda-step-function
2216

23-
## [1.0.0] - 2020-08-28
24-
### Added
25-
- Initial release
17+
### Updated
18+
19+
- The join condition for Topic Modeling in Amazon QuickSight dataset to provide accurate topic identification for a specific run
20+
- ID and name generation for Amazon QuickSigh resource to use dynamic value based on the stack name
21+
- AWS CDK version to 1.73.0
22+
- AWS SDK version to 2.790.0
23+
24+
## [1.2.0] - 2020-10-29
25+
26+
### Added
27+
28+
- New and simplified interactive Amazon QuickSight dashboard that is now automatically generated through an AWS CloudFormation deployment and that customers can extend to suit their business case
29+
30+
### Updated
31+
32+
- Updated to AWS CDK v1.69.0
33+
- Consolidate Amazon S3 access Log bucket across the solution. All access log files have a prefix that corresponds to the bucket for which they are generated
34+
35+
## [1.1.0] - 2020-09-29
36+
37+
### Updated
38+
39+
- S3 storage for inference outputs to use Apache Parquet
40+
- Add partitioning to AWS Glue tables
41+
- Update to AWS CDK v1.63.0
42+
- Update to AWS SDK v2.755.0
43+
44+
## [1.0.0] - 2020-08-28
45+
46+
### Added
47+
48+
- Initial release

README.md

Lines changed: 84 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -5,26 +5,35 @@ The Discovering Hot Topics Using Machine Learning solution helps you identify th
55
The solution uses machine learning algorithms to automate digital asset (text and image) ingestion and perform near real-time topic modeling, sentiment analysis, and image detection. The solution then visualizes these large-scale customer analyses using an Amazon QuickSight dashboard. This guide provides step-by-step instructions to building a dashboard that provides you with the context and insights necessary to identify trends that help or harm you brand.
66

77
The solution performs the following key features:
8-
* **Performs topic modeling to detect dominant topics**: identifies the terms that collectively form a topic from within customer feedback
9-
* **Identifies the sentiment of what customers are saying**: uses contextual semantic search to understand the nature of online discussions
10-
* **Determines if images associated with your brand contain unsafe content**: detects unsafe and negative imagery in content
11-
* **Helps customers identify insights in near real-time**: you can use a visualization dashboard to better understand context, threats, and opportunities almost instantly
8+
9+
- **Performs topic modeling to detect dominant topics**: identifies the terms that collectively form a topic from within customer feedback
10+
- **Identifies the sentiment of what customers are saying**: uses contextual semantic search to understand the nature of online discussions
11+
- **Determines if images associated with your brand contain unsafe content**: detects unsafe and negative imagery in content
12+
- **Helps customers identify insights in near real-time**: you can use a visualization dashboard to better understand context, threats, and opportunities almost instantly
1213

1314
For an overview and solution deployment guide, please visit [Discovering Hot Topics using Machine Learning](https://aws.amazon.com/solutions/implementations/discovering-hot-topics-using-machine-learning)
1415

15-
## Architecture Diagram
16+
## On this Page
17+
18+
- [Architecture Overview](#architecture-overview)
19+
- [Deployment](#deployment)
20+
- [Source Code](#source-code)
21+
- [Creating a custom build](#creating-a-custom-build)
22+
23+
## Architecture Overview
1624

1725
Deploying this solution with the default parameters builds the following environment in the AWS Cloud. The overall architecture of the solution has the following key components. Note that the below diagram represents Twitter as the ingestion feed - there are plans to add other social media platforms in future releases.
26+
1827
<p align="center">
1928
<img src="source/images/architecture.png">
2029
<br/>
2130
</p>
2231

23-
* Ingestion – Social media feed ingestion using a combination of Lambda functions, Kinesis Data Stream and DynamoDB to manage state
24-
* Workflow – An AWS Step Function based workflow to orchestrate various services
25-
* Inference – AWS Cloud’s machine learning capabilities through Amazon Translate, Amazon Comprehend, and Amazon Rekognition
26-
* Application Integration – Event based architecture approach through the use of AWS Events Bridge
27-
* Storage and Visualization – A combination of Kinesis Data Firehose, S3 Buckets, Glue, Athena and QuickSight
32+
- Ingestion – Social media feed ingestion using a combination of Lambda functions, Kinesis Data Stream and DynamoDB to manage state
33+
- Workflow – An AWS Step Function based workflow to orchestrate various services
34+
- Inference – AWS Cloud’s machine learning capabilities through Amazon Translate, Amazon Comprehend, and Amazon Rekognition
35+
- Application Integration – Event based architecture approach through the use of AWS Events Bridge
36+
- Storage and Visualization – A combination of Kinesis Data Firehose, S3 Buckets, Glue, Athena and QuickSight
2837

2938
<p align="center">
3039
<img src="source/images/dashboard.png">
@@ -33,35 +42,92 @@ Deploying this solution with the default parameters builds the following environ
3342

3443
After you deploy the solution, use the included Amazon QuickSight dashboard to visualize the solution's machine learning inferences. The image above is an example visualization dashboard featuring a dominant topic list, donut charts, weekly and monthly trend graphs, a word cloud, a tweet table, and a heat map.
3544

36-
## 1. Build the solution
45+
# AWS CDK Constructs
46+
47+
[AWS CDK Solutions Constructs](https://aws.amazon.com/solutions/constructs/) make it easier to consistently create well-architected applications. All AWS Solutions Constructs are reviewed by AWS and use best practices established by the AWS Well-Architected Framework. This solution uses the following AWS CDK Constructs:
48+
49+
- aws-events-rule-lambda
50+
- aws-kinesisfirehose-s3
51+
- aws-kinesisstreams-lambda
52+
- aws-lambda-dynamodb
53+
- aws-lambda-s3
54+
- aws-lambda-step-function
55+
56+
## Deployment
57+
58+
The solution is deployed using a CloudFormation template with a lambda backed custom resource that builds the Amazon QuickSight Analaysis and Dashboards. For details on deploying the solution please see the details on the solution home page: [Discovering Hot Topics Using Machine Learning](aws.amazon.com/solutions/implementations/discovering-hot-topics-using-machine-learning/)
59+
60+
## Source Code
61+
62+
### Project directory structure
63+
64+
```
65+
├── deployment [folder containing build scripts]
66+
│   ├── cdk-solution-helper [A helper function to help deploy lambda function code through S3 buckets]
67+
└── source [source code containing CDK App and lambda functions]
68+
├── bin [entrypoint of the CDK application]
69+
├── lambda [folder containing source code the lambda functions]
70+
│   ├── firehose-text-proxy [lambda function to write text analysis output to Amazon Kinesis Firehose]
71+
│   ├── firehose_topic_proxy [lambda function to write topic analysis output to Amazon Kinesis Firehose]
72+
│   ├── ingestion-consumer [lambda function that consumes messages from Amazon Kinesis Data Stream]
73+
│   ├── ingestion-producer [lambda function that makes Twitter API call and pushes data to Amazon Kinesis Data Stream]
74+
│   ├── integration [lambda function that publishes inference outputs to Amazon Events Bridge]
75+
│   ├── storage-firehose-processor [lambda function that writes data to S3 buckets to build a relational model]
76+
│   ├── wf-analyze-text [lambda function to detect sentiments, key phrases and entities using Amazon Comprehend]
77+
│   ├── wf-check-topic-model [lambda function to check status of topic modeling jobs on Amazon Comprehend]
78+
│   ├── wf-detect-moderation-labels [lambda function to detect content moderation using Amazon Rekognition]
79+
│   ├── wf-extract-text-in-image [lambda function to extract text content from images using Amazon Rekognition]
80+
│   ├── wf-publish-text-inference [lambda function to publish Amazon Comprehend inferences]
81+
│   ├── wf-submit-topic-model [lambda function to submit topic modeling job]
82+
│   ├── wf-translate-text [lambda function to translate non-english text using Amazon Translate]
83+
│   └── wf_publish_topic_model [lambda function to publish topic modeling inferences from Amazon Comprehend]
84+
├── lib
85+
│   ├── ingestion [CDK constructs for data ingestion]
86+
│   ├── integration [CDK constructs for Amazon Events Bridge]
87+
│   ├── storage [CDK constructs that define storage of the inference events]
88+
│   ├── text-analysis-workflow [CDK constructs for text analysis of ingested data]
89+
│   ├── topic-analysis-workflow [CDK constructs for topic visualization of ingested data]
90+
│   └── visualization [CDK constructs to build a relational database model for visualization]
91+
```
92+
93+
## Creating a custom build
94+
95+
The solution can be deployed through the CloudFormation template available on the solution home page: [Discovering Hot Topics Using Machine Learning](aws.amazon.com/solutions/implementations/discovering-hot-topics-using-machine-learning/). To make changes to the solution, using the below steps download or clone this repo, update the source code and then run the deployment/build-s3-dist.sh script to deploy the updated Lambda code to an Amazon S3 bucket in your account.
96+
97+
### 1. Clone the repository
3798

3899
Clone this git repository
39100

40101
`git clone https://github.com/awslabs/<repository_name>`
41102

42-
## 2. Build the solution for deployment
103+
### 2. Build the solution for deployment
104+
105+
- To run the unit tests
43106

44-
* To run the unit tests
45107
```
46108
cd <rootDir>/source
47109
chmod +x ./run-all-tests.sh
48110
./run-all-tests.sh
49111
```
50112

51-
* Configure the bucket name of your target Amazon S3 distribution bucket
113+
- Configure the bucket name of your target Amazon S3 distribution bucket
114+
52115
```
53116
export DIST_OUTPUT_BUCKET=my-bucket-name
54117
export VERSION=my-version
55118
```
56119

57-
* Now build the distributable:
120+
- Now build the distributable:
121+
58122
```
59123
cd <rootDir>/deployment
60124
chmod +x ./build-s3-dist.sh
61125
./build-s3-dist.sh $DIST_OUTPUT_BUCKET $SOLUTION_NAME $VERSION $CF_TEMPLATE_BUCKET_NAME QS_TEMPLATE_ACCOUNT
62126
63127
```
64-
* Parameter details
128+
129+
- Parameter details
130+
65131
```
66132
$DIST_OUTPUT_BUCKET - This is the global name of the distribution. For the bucket name, the AWS Region is added to the global name (example: 'my-bucket-name-us-east-1') to create a regional bucket. The lambda artifact should be uploaded to the regional buckets for the CloudFormation template to pick it up for deployment.
67133
$SOLUTION_NAME - The name of This solution (example: discovering-hot-topics-using-machine-learning)
@@ -70,44 +136,14 @@ $CF_TEMPLATE_BUCKET_NAME - The name of the S3 bucket where the CloudFormation te
70136
$QS_TEMPLATE_ACCOUNT - The account from which the Amazon QuickSight templates should be sourced for Amazon QuickSight Analysis and Dashboard creation
71137
```
72138

139+
- Deploy the distributable to an Amazon S3 bucket in your account. _Note:_ you must have the AWS Command Line Interface installed.
73140

74-
* Deploy the distributable to an Amazon S3 bucket in your account. _Note:_ you must have the AWS Command Line Interface installed.
75141
```
76142
aws s3 cp ./global-s3-assets/ s3://my-bucket-name-<aws_region>/discovering-hot-topics-using-machine-learning/<my-version>/ --recursive --acl bucket-owner-full-control --profile aws-cred-profile-name
77143
aws s3 cp ./regional-s3-assets/ s3://my-bucket-name-<aws_region>/discovering-hot-topics-using-machine-learning/<my-version>/ --recursive --acl bucket-owner-full-control --profile aws-cred-profile-name
78144
```
79145

80-
## Project directory structure
81-
```
82-
├── deployment [folder containing build scripts]
83-
│   ├── cdk-solution-helper [A helper function to help deploy lambda function code through S3 buckets]
84-
└── source [source code containing CDK App and lambda functions]
85-
├── bin [entrypoint of the CDK application]
86-
├── lambda [folder containing source code the lambda functions]
87-
│   ├── firehose-text-proxy [lambda function to write text analysis output to Amazon Kinesis Firehose]
88-
│   ├── firehose_topic_proxy [lambda function to write topic analysis output to Amazon Kinesis Firehose]
89-
│   ├── ingestion-consumer [lambda function that consumes messages from Amazon Kinesis Data Stream]
90-
│   ├── ingestion-producer [lambda function that makes Twitter API call and pushes data to Amazon Kinesis Data Stream]
91-
│   ├── integration [lambda function that publishes inference outputs to Amazon Events Bridge]
92-
│   ├── storage-firehose-processor [lambda function that writes data to S3 buckets to build a relational model]
93-
│   ├── wf-analyze-text [lambda function to detect sentiments, key phrases and entities using Amazon Comprehend]
94-
│   ├── wf-check-topic-model [lambda function to check status of topic modeling jobs on Amazon Comprehend]
95-
│   ├── wf-detect-moderation-labels [lambda function to detect content moderation using Amazon Rekognition]
96-
│   ├── wf-extract-text-in-image [lambda function to extract text content from images using Amazon Rekognition]
97-
│   ├── wf-publish-text-inference [lambda function to publish Amazon Comprehend inferences]
98-
│   ├── wf-submit-topic-model [lambda function to submit topic modeling job]
99-
│   ├── wf-translate-text [lambda function to translate non-english text using Amazon Translate]
100-
│   └── wf_publish_topic_model [lambda function to publish topic modeling inferences from Amazon Comprehend]
101-
├── lib
102-
│   ├── ingestion [CDK constructs for data ingestion]
103-
│   ├── integration [CDK constructs for Amazon Events Bridge]
104-
│   ├── storage [CDK constructs that define storage of the inference events]
105-
│   ├── text-analysis-workflow [CDK constructs for text analysis of ingested data]
106-
│   ├── topic-analysis-workflow [CDK constructs for topic visualization of ingested data]
107-
│   └── visualization [CDK constructs to build a relational database model for visualization]
108-
```
109-
110-
***
146+
---
111147

112148
Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
113149

deployment/build-s3-dist.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@
2626
set -e
2727

2828
# Important: CDK global version number
29-
cdk_version=1.69.0
29+
cdk_version=1.73.0
3030

3131
# Check to see if input has been provided:
3232
if [ -z "$1" ] || [ -z "$2" ] || [ -z "$3" ] || [ -z "$4" ] || [ -z "$5" ] || [ -z "$6" ]; then

source/bin/discovering-hot-topics-app.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
* Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance *
66
* with the License. A copy of the License is located at *
77
* *
8-
* http://www.apache.org/licenses/LICNSE-2.0 *
8+
* http://www.apache.org/licenses/LICENSE-2.0 *
99
* *
1010
* or in the 'license' file accompanying this file. This file is distributed on an 'AS IS' BASIS, WITHOUT WARRANTIES *
1111
* OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions *

source/lambda/create-partition/index.js

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
* Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance *
55
* with the License. A copy of the License is located at *
66
* *
7-
* http://www.apache.org/licenses/LICNSE-2.0 *
7+
* http://www.apache.org/licenses/LICENSE-2.0 *
88
* *
99
* or in the 'license' file accompanying this file. This file is distributed on an 'AS IS' BASIS, WITHOUT WARRANTIES *
1010
* OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions *

source/lambda/create-partition/jest.config.js

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
* Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance *
55
* with the License. A copy of the License is located at *
66
* *
7-
* http://www.apache.org/licenses/LICNSE-2.0 *
7+
* http://www.apache.org/licenses/LICENSE-2.0 *
88
* *
99
* or in the 'license' file accompanying this file. This file is distributed on an 'AS IS' BASIS, WITHOUT WARRANTIES *
1010
* OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions *

0 commit comments

Comments
 (0)