-
Notifications
You must be signed in to change notification settings - Fork 1
Text Prediction Pipeline
PeilanWang edited this page May 4, 2021
·
2 revisions
Text pipeline is integrated within the UniversalMLObject architecture with small variations.
Pipeline:
| File Name | Description |
|---|---|
| server/dependency.py | Stored inside the Object collection & text_content as optional field |
| server/db_connection.py | Use object_collection functions |
| server/routers/prediction.py | Uses object_endpoint to predict, get a result, and store result |
| server/prediction_worker/utility/main.py | Uses the same predict method as image and other forms of data |
Summary: This is an endpoint at localhost:5000/model/predict, which receives a list of text files and predict using given models, then store UniversalMLObject in the object_collection.
Params:
- objects: A list of txt files in this case, currently has to be a '.txt' file which contains information that needs to be predicted
- models: A list of strings which are model names
- current_user: the username of the current user
- model_type: The type of input files, "text" in this case
Return: Dictionary with key "text", value is a list of hashes for different files, example below:
{
"text": [
"bd931a6a2262fbab85c18b5a9bfa5a78",
"cd931a6a2widjfidf5c18b5a9bfa5a78"
]
}
Predict is mostly the same for all types of data. However, when model_type = 'text':
- Then text_content is stored in UniversalMLObject as str
- Then calls 'utility.main.predict_object' function with params hash_md5 & text_content
- Compare to other forms of data, where file_name is passed as param instead of text_content, so prediction worker can access the data in the shared folder by file_name
else:
text_content = file_obj.read()
text_content = text_content.decode('UTF-8')
for model in models:
Queue(name=model, connection=redis).enqueue(
'utility.main.predict_object', hash_md5, text_content, job_id=hash_md5+model+str(uuid.uuid4())
)Summary: This function is used by server to call prediction_workers to predict. Params:
- Object_hash: md5_hashes for the Object
- prediction_obj: Text is text_content ( str of data), other data has a file_path where can be used to find the data
Output:
- Prediction result
- If the prediction failed, an error message printed out in the console
- Simple_text model is receiving any string and always return Positive: 1 as the result
- Sentiment_analysis: Hugging face model, determine a piece of text is positive or negative and gives the score
- NER_analysis: Hugging face model, return named entities from the text including person, location and organizations