Temporal Word Embeddings for Early Detection of Psychological Disorders on Social Media¶
How to early detect psychological disorders on social media using temporal word embeddings¶
Abastract¶
Mental health disorders represent a public health challenge, where early detection is critical to mitigating adverse outcomes for individuals and society. The study of language and behavior is a pivotal component in mental health research, and the content from social media platforms serves as a valuable tool for identifying signs of mental health risks. This paper presents a novel framework leveraging temporal word embeddings to capture linguistic changes over time. We specifically aim at at identifying emerging psychological concerns on social media. By adapting temporal word representations, our approach quantifies shifts in language use that may signal mental health risks. To that end, we implement two alternative temporal word embedding models to detect linguistic variations and exploit these variations to train early detection classifiers. Our experiments, conducted on 18 datasets from the eRisk initiative (covering signs of conditions such as depression, anorexia, and self-harm), show that simple models focusing exclusively on temporal word usage patterns achieve competitive performance compared to state-of-the-art systems. Additionally, we perform a word-level analysis to understand the evolution of key terms among positive and control users. These findings underscore the potential of time-sensitive word models in this domain, being a promising avenue for future research in mental health surveillance.
Models¶
TWEC¶
In this tutorial, we will focus exclusively on TWEC. Let's begin by defining our temporal word embedding models. The first model, TWEC (Temporal Word Embeddings with a Compass), is an extension of Word2Vec that incorporates temporal information. It captures linguistic shifts over time by leveraging the context of surrounding words across different time periods.
from models.twec import TWEC
Deltas¶
Deltas is a metric designed to quantify semantic drift in word meaning over time within a diachronic corpus. It is computed by applying similarity measures—such as cosine similarity or Euclidean distance—between temporally contextualized word embeddings and their corresponding static representations.
from models.deltas import DISTANCES
Filters¶
Now that we have defined our models, we need to encapsulate them inside a class that implements the BaseFilter interface and binds them to the container.
# !pip install framework3==1.1.1
from labchain.utils.patch_type_guard import patch_inspect_for_notebooks
patch_inspect_for_notebooks()
✅ Patched inspect.getsource using dill.
from concurrent.futures import ThreadPoolExecutor, as_completed
from typing import List, Literal
from scipy.sparse import dok_matrix
from tqdm import tqdm
from labchain.base.base_clases import BaseFilter
from labchain.base.base_types import XYData
from labchain.container import Container
import pandas as pd
import numpy as np
import torch
import os
@Container.bind()
class TWECFilter(BaseFilter):
def __init__(
self,
context_size: int,
_cpus: int = 4,
deltas_f: List[
Literal[
"cosine",
"euclidean",
"chebyshev",
"jensen_shannon",
"wasserstein",
"manhattan",
"minkowski",
]
] = ["cosine"],
):
super().__init__()
self._twec = TWEC(size=300, window=context_size)
self.deltas_f = deltas_f
self.context_size = context_size
actual_cpus = os.cpu_count()
if actual_cpus is not None:
self._cpus = min(actual_cpus, _cpus)
else:
self._cpus = _cpus
def fit(self, x: XYData, y: XYData | None) -> float | None:
data: pd.DataFrame = x.value
self._twec.train_compass(data.text.values.tolist())
self._vocab_hash_map = dict(
zip(
self._twec.compass.wv.index_to_key, # type: ignore
range(len(self._twec.compass.wv.index_to_key)), # type: ignore
)
)
def predict(self, x: XYData) -> XYData:
data: pd.DataFrame = x.value
n_rows = len(data.index)
n_cols = len(self._vocab_hash_map.items())
metric_names = self.deltas_f
all_deltas = {
metric: dok_matrix((n_rows, n_cols), dtype=np.float32)
for metric in metric_names
}
def process_user_deltas(i, tc):
result = {metric: [] for metric in metric_names}
for word in tc.wv.index_to_key: # type: ignore
if word in self._vocab_hash_map:
j = self._vocab_hash_map[word]
for metric in metric_names:
dist = (
DISTANCES[metric](
torch.tensor(np.array([[self._twec.compass.wv[word]]])), # type: ignore
torch.tensor(np.array([[tc.wv[word]]])), # type: ignore
)
.detach()
.cpu()
.item()
)
result[metric].append((i, j, dist))
return result
with ThreadPoolExecutor(max_workers=self._cpus) as executor:
futures = {
executor.submit(
process_user_deltas, i, self._twec.train_slice(row.text)
): i
for i, row in tqdm(
enumerate(data.itertuples()),
total=n_rows,
desc="generating embeddings",
)
}
for future in tqdm(
as_completed(futures), total=n_rows, desc="parallel prediction"
):
chunk_result = future.result()
for metric, values in chunk_result.items():
for i, j, val in values:
all_deltas[metric][i, j] = val
return XYData.mock(all_deltas)
The classifiers¶
This work addresses an early prediction task using the eRisk dataset, which requires the use of classification models. We will now define a set of classifiers and integrate them within the framework by wrapping them in the appropriate classes.
⚠️ Warning: In order for classes to be parallelizable, they must be defined in a standalone module. For this reason, we have moved the classifiers to separate files. The code shown here is provided for reference purposes only.
⚠️ Warning: Also note that some hyperparameters are not primitive types. While this works well with sklearn and Optuna optimizers, it may break when using the wandb optimizer. The code should be adapted accordingly if you plan to use wandb for optimization.
SVM¶
from typing import Callable, Mapping
from sklearn.svm import SVC
@Container.bind()
class ClassifierSVM(BaseFilter):
def __init__(
self,
C: float = 1,
kernel: Callable | Literal['linear', 'poly', 'rbf', 'sigmoid', 'precomputed'] = "rbf",
gamma: float | Literal['scale', 'auto'] = "scale",
coef0:float=0.0,
tol:float=0.001,
decision_function_shape:Literal['ovo', 'ovr'] = "ovr",
class_weight_1: Mapping[Any, Any] | str | None = None,
probability:bool = False,
):
super().__init__()
self.proba = probability
self._model = SVC(
C=C,
kernel=kernel,
gamma=gamma,
coef0=coef0,
tol=tol,
decision_function_shape=decision_function_shape,
class_weight={1: class_weight_1},
probability=probability,
random_state=43,
)
def fit(self, x: XYData, y: XYData | None):
if y is None:
raise ValueError("y must be provided for training")
self._model.fit(x.value, y.value)
def predict(self, x: XYData) -> XYData:
if self.proba:
result = list(map(lambda i: i[1], self._model.predict_proba(x.value)))
return XYData.mock(result)
else:
result = self._model.predict(x.value)
return XYData.mock(result)
from models.svm import ClassifierSVM
The Metrics¶
Since the early prediction task is essentially a classification problem, we will use standard classification metrics such as F1-score, Precision, and Recall. However, due to the early nature of the task, we also need to include metrics that penalize delayed decisions, as timing is a critical aspect of the evaluation.
from labchain.plugins.metrics import F1, Precission, Recall
f1 = F1()
precision = Precission()
recall = Recall()
from typing import Iterable
from sklearn.metrics import confusion_matrix
from labchain import BaseMetric
from numpy import exp
@Container.bind()
class ERDE(BaseMetric):
def __init__(self, count: Iterable, k: int = 5):
self.k = k
self.count = count
def evaluate(
self, x_data: XYData, y_true: XYData | None, y_pred: XYData
) -> float | np.ndarray:
if y_true is None:
raise ValueError("y_true must be provided for evaluation")
all_erde = []
_, _, _, tp = confusion_matrix(y_true.value, y_pred.value).ravel()
for expected, result, count in list(
zip(y_true.value, y_pred.value, self.count)
):
if result == 1 and expected == 0:
all_erde.append(float(tp) / len(y_true.value))
elif result == 0 and expected == 1:
all_erde.append(1.0)
elif result == 1 and expected == 1:
all_erde.append(1.0 - (1.0 / (1.0 + exp(count - self.k))))
elif result == 0 and expected == 0:
all_erde.append(0.0)
return float(np.mean(all_erde) * 100)
from metrics.erde import ERDE_5, ERDE_50
For simplicity, we will only consider the 2023 gambling data.¶
gambling_2023_train = pd.read_csv("data/standard_gambling_train_2023.csv", index_col=0)
gambling_2023_train.head(5)
gambling_2023_test = pd.read_csv("data/standard_gambling_2023.csv", index_col=0)
gambling_2023_test.head(5)
| id | text | date | chunk | label | user | |
|---|---|---|---|---|---|---|
| 0 | subject5539_0 | For PC: I don't know what company, but they ne... | 2015-12-03 13:31:29 | 0 | 0 | subject5539 |
| 1 | subject5539_1 | You play as a Pokmon trainer (that you customi... | 2015-12-04 19:45:40 | 0 | 0 | subject5539 |
| 2 | subject5539_2 | A Clash of Clans RPG (or MMORPG) | 2015-12-23 23:32:51 | 0 | 0 | subject5539 |
| 3 | subject5539_3 | You would have to manage your species's needs ... | 2015-12-26 20:45:30 | 0 | 0 | subject5539 |
| 4 | subject5539_4 | The game starts you as a child and you have to... | 2016-01-02 08:40:18 | 0 | 0 | subject5539 |
gg_2023_train = (
gambling_2023_train.groupby(["user", "chunk"])
.agg(
{
"id": "count",
"text": list,
"date": list,
"label": "first",
}
)
.rename(columns={"id": "n_texts"})
.reset_index()
)
gg_2023_test = (
gambling_2023_test.groupby(["user", "chunk"])
.agg(
{
"id": "count",
"text": list,
"date": list,
"label": "first",
}
)
.rename(columns={"id": "n_texts"})
.reset_index()
)
gg_2023_train
| user | chunk | n_texts | text | date | label | |
|---|---|---|---|---|---|---|
| 0 | subject1 | 0 | 132 | [Vulcan's ultimate landing at max range is so ... | [2017-08-18 11:34:09, 2017-08-20 15:26:34, 201... | 0 |
| 1 | subject1 | 1 | 132 | [Awesome! It is always good to hear these news... | [2018-05-18 23:46:33, 2018-06-18 17:17:55, 201... | 0 |
| 2 | subject1 | 2 | 132 | [The syringe is a lie!, I'd say Scylla or Than... | [2018-09-20 08:20:44, 2018-09-24 10:12:03, 201... | 0 |
| 3 | subject1 | 3 | 131 | [Some of the symptoms you may experience are b... | [2019-05-06 17:50:52, 2019-05-06 19:05:44, 201... | 0 |
| 4 | subject1 | 4 | 132 | [So ur saying that huge map is better than Afg... | [2019-10-06 23:22:46, 2019-10-12 18:08:06, 201... | 0 |
| ... | ... | ... | ... | ... | ... | ... |
| 39265 | subject9999 | 5 | 8 | [10v10, is those a bunch of bots, I didn't eve... | [2021-07-04 06:51:04, 2021-07-04 07:01:05, 202... | 0 |
| 39266 | subject9999 | 6 | 8 | [I'm commenting this based on the fact that Am... | [2021-07-15 19:11:42, 2021-07-27 17:44:09, 202... | 0 |
| 39267 | subject9999 | 7 | 8 | [Aesthetic set, It's a fucking downgrade,, It'... | [2021-09-12 14:55:05, 2021-09-23 00:31:11, 202... | 0 |
| 39268 | subject9999 | 8 | 8 | [u/save, u/savevideo, Snu snu ! Snu snu! Snu s... | [2021-09-23 13:38:48, 2021-10-08 13:44:30, 202... | 0 |
| 39269 | subject9999 | 9 | 8 | [Why every fucking time there's a new weapon o... | [2021-11-27 06:45:16, 2021-11-27 07:04:36, 202... | 0 |
39270 rows × 6 columns
⚠️ Warning: There are several restrictions for the plugins to work properly:
- Constructor arguments should be public attributes.
- Other data must be set as private attributes.
- All public attributes must be serializable using
jsonable_encoder.
test_erde_5 = ERDE_5()
test_erde_50 = ERDE_50()
Selector¶
We are using sklearn for grid search. This optimizer will check the input dimensions of the X and y values. We have generated a dictionary with the deltas based on different distance measures, but this results in an incompatible dimensions error from sklearn. To work around this issue, we define a class that selects the appropriate deltas based on a hyperparameter.
@Container.bind()
class DeltaSelectorFilter(BaseFilter):
def __init__(
self,
deltas_f: Literal[
"cosine",
"euclidean",
"chebyshev",
"jensen_shannon",
"wasserstein",
"manhattan",
"minkowski",
] = "cosine",
):
super().__init__()
self.deltas_f = deltas_f
def fit(self, x: XYData, y: XYData | None):
pass
def predict(self, x: XYData) -> XYData:
# Crear una nueva dok_matrix con el mismo shape y en float32
old_dok = x.value[self.deltas_f]
new_dok = dok_matrix(old_dok.shape, dtype=np.float32)
# Copiar todos los valores existentes y convertir el dtype
for (i, j), value in old_dok.items():
new_dok[i, j] = float(value) # conversión a float32 implícita
return XYData.mock(new_dok.tocsr())
The pipeline¶
Now comes the most exciting part: integrating the filters into the pipeline. This step can be done incrementally, which is more convenient when developing a model. However, since we already have a clear understanding of the process, we will combine all the parts into one step.
from labchain import Cached, SklearnOptimizer
from labchain.plugins.pipelines.sequential import F3Pipeline
all_test_metrics = [
f1,
precision,
recall,
test_erde_5,
test_erde_50,
]
pipeline_svm = F3Pipeline(
filters=[
Cached(
filter=TWECFilter(
context_size=25,
_cpus=10,
deltas_f=["cosine", "euclidean", "manhattan", "chebyshev"],
),
),
DeltaSelectorFilter(deltas_f="cosine"),
F3Pipeline(
filters=[
ClassifierSVM(
tol=0.003,
probability=False,
decision_function_shape="ovr",
kernel="rbf",
gamma="scale",
).grid(
{
"C": [1, 3, 5],
"class_weight_1": [{1: 1.5}, {1: 2.5}, {1: 3.0}],
}
)
],
metrics=[F1()],
).optimizer(SklearnOptimizer(scoring="f1_weighted", cv=2, n_jobs=-1)),
],
metrics=all_test_metrics,
)
Data Preparation¶
In F3, all data must be wrapped in the XYData class. This ensures that each data transformation is hashed and the results are cached.
train_x = XYData(_hash=" Gambling_2023_train_x", _path="/dataset", _value=gg_2023_train)
train_y = XYData(
_hash="Gambling_2023_train_y", _path="/dataset", _value=gg_2023_train.label.tolist()
)
test_x = XYData(_hash="Gambling_2023_test_x", _path="/dataset", _value=gg_2023_test)
test_y = XYData(
_hash="Gambling_2023_test_y", _path="/dataset", _value=gg_2023_test.label.tolist()
)
Model training¶
⚠️ Warning: Please note that for parallel backend usage, a considerable amount of RAM will be required.
from joblib import parallel_backend
import sys
with parallel_backend("loky", n_jobs=-1):
print("Starting GridSearchCV fitting...", flush=True)
pipeline_svm.fit(train_x, train_y)
sys.stdout.flush()
Starting GridSearchCV fitting... Calling prefit on Cached Calling prefit on DeltaSelectorFilter Calling prefit on SklearnOptimizer
____________________________________________________________________________________________________
Fitting pipeline...
****************************************************************************************************
Cached( filter=TWECFilter(deltas_f=['cosine', 'euclidean', 'manhattan', 'chebyshev'], context_size=25), cache_data=True, cache_filter=True, overwrite=False, storage=None )
Calling prefit on TWECFilter
- El filtro TWECFilter({'deltas_f': ['cosine', 'euclidean', 'manhattan', 'chebyshev'], 'context_size': 25}) Existe, se carga del storage.
- El dato XYData(_hash='f991d8f14f3bbdb0a54b565de7e60e42cfd36dc9', _path='TWECFilter/60b322a0bd676ce665f0d6b568a28ef664fef914') Existe, se carga del storage.
DeltaSelectorFilter(deltas_f='cosine')
Calling prefit on DeltaSelectorFilter * Downloading: <_io.BufferedReader name='cache/TWECFilter/60b322a0bd676ce665f0d6b568a28ef664fef914/f991d8f14f3bbdb0a54b565de7e60e42cfd36dc9'>
SklearnOptimizer( scoring='f1_weighted', cv=2, pipeline=F3Pipeline( filters=[ClassifierSVM(proba=False)], metrics=[F1(average='binary')], overwrite=False, store=False, log=False ), n_jobs=-1 )
Calling prefit on SklearnOptimizer
Fitting 2 folds for each of 9 candidates, totalling 18 fits
[CV 1/2; 3/9] START ClassifierSVM__C=1, ClassifierSVM__class_weight_1={1: 3.0}..
[CV 2/2; 7/9] START ClassifierSVM__C=5, ClassifierSVM__class_weight_1={1: 1.5}..
Calling prefit on ClassifierSVM
[CV 1/2; 1/9] START ClassifierSVM__C=1, ClassifierSVM__class_weight_1={1: 1.5}..
[CV 1/2; 6/9] START ClassifierSVM__C=3, ClassifierSVM__class_weight_1={1: 3.0}..
[CV 2/2; 9/9] START ClassifierSVM__C=5, ClassifierSVM__class_weight_1={1: 3.0}..
Calling prefit on ClassifierSVM
Calling prefit on ClassifierSVM
Calling prefit on ClassifierSVM
Calling prefit on ClassifierSVM
[CV 1/2; 9/9] START ClassifierSVM__C=5, ClassifierSVM__class_weight_1={1: 3.0}..
Calling prefit on ClassifierSVM
[CV 2/2; 3/9] START ClassifierSVM__C=1, ClassifierSVM__class_weight_1={1: 3.0}..
Calling prefit on ClassifierSVM
[CV 1/2; 5/9] START ClassifierSVM__C=3, ClassifierSVM__class_weight_1={1: 2.5}..
[CV 2/2; 1/9] START ClassifierSVM__C=1, ClassifierSVM__class_weight_1={1: 1.5}..
[CV 1/2; 2/9] START ClassifierSVM__C=1, ClassifierSVM__class_weight_1={1: 2.5}..
[CV 1/2; 7/9] START ClassifierSVM__C=5, ClassifierSVM__class_weight_1={1: 1.5}..
Calling prefit on ClassifierSVM
[CV 2/2; 4/9] START ClassifierSVM__C=3, ClassifierSVM__class_weight_1={1: 1.5}..
Calling prefit on ClassifierSVM
Calling prefit on ClassifierSVM
Calling prefit on ClassifierSVM
Calling prefit on ClassifierSVM
[CV 2/2; 2/9] START ClassifierSVM__C=1, ClassifierSVM__class_weight_1={1: 2.5}..
[CV 1/2; 8/9] START ClassifierSVM__C=5, ClassifierSVM__class_weight_1={1: 2.5}..
[CV 1/2; 4/9] START ClassifierSVM__C=3, ClassifierSVM__class_weight_1={1: 1.5}..
[CV 2/2; 5/9] START ClassifierSVM__C=3, ClassifierSVM__class_weight_1={1: 2.5}..
Calling prefit on ClassifierSVM
Calling prefit on ClassifierSVM
[CV 2/2; 6/9] START ClassifierSVM__C=3, ClassifierSVM__class_weight_1={1: 3.0}..
Calling prefit on ClassifierSVM
Calling prefit on ClassifierSVM
[CV 2/2; 8/9] START ClassifierSVM__C=5, ClassifierSVM__class_weight_1={1: 2.5}..
Calling prefit on ClassifierSVM
Calling prefit on ClassifierSVM
[CV 1/2; 9/9] END ClassifierSVM__C=5, ClassifierSVM__class_weight_1={1: 3.0};, score=0.965 total time=24.9min
[CV 1/2; 5/9] END ClassifierSVM__C=3, ClassifierSVM__class_weight_1={1: 2.5};, score=0.965 total time=25.2min
[CV 1/2; 1/9] END ClassifierSVM__C=1, ClassifierSVM__class_weight_1={1: 1.5};, score=0.964 total time=25.5min
[CV 2/2; 2/9] END ClassifierSVM__C=1, ClassifierSVM__class_weight_1={1: 2.5};, score=0.964 total time=26.1min
[CV 1/2; 4/9] END ClassifierSVM__C=3, ClassifierSVM__class_weight_1={1: 1.5};, score=0.965 total time=26.3min
[CV 1/2; 2/9] END ClassifierSVM__C=1, ClassifierSVM__class_weight_1={1: 2.5};, score=0.964 total time=26.4min
[CV 2/2; 1/9] END ClassifierSVM__C=1, ClassifierSVM__class_weight_1={1: 1.5};, score=0.963 total time=26.5min
[CV 1/2; 7/9] END ClassifierSVM__C=5, ClassifierSVM__class_weight_1={1: 1.5};, score=0.965 total time=26.8min
[CV 1/2; 3/9] END ClassifierSVM__C=1, ClassifierSVM__class_weight_1={1: 3.0};, score=0.964 total time=27.4min
[CV 1/2; 6/9] END ClassifierSVM__C=3, ClassifierSVM__class_weight_1={1: 3.0};, score=0.965 total time=27.6min
[CV 2/2; 6/9] END ClassifierSVM__C=3, ClassifierSVM__class_weight_1={1: 3.0};, score=0.966 total time=27.7min
[CV 1/2; 8/9] END ClassifierSVM__C=5, ClassifierSVM__class_weight_1={1: 2.5};, score=0.965 total time=27.8min
[CV 2/2; 3/9] END ClassifierSVM__C=1, ClassifierSVM__class_weight_1={1: 3.0};, score=0.965 total time=28.1min
[CV 2/2; 9/9] END ClassifierSVM__C=5, ClassifierSVM__class_weight_1={1: 3.0};, score=0.966 total time=28.4min
[CV 2/2; 4/9] END ClassifierSVM__C=3, ClassifierSVM__class_weight_1={1: 1.5};, score=0.964 total time=28.8min
[CV 2/2; 8/9] END ClassifierSVM__C=5, ClassifierSVM__class_weight_1={1: 2.5};, score=0.965 total time=28.9min
[CV 2/2; 5/9] END ClassifierSVM__C=3, ClassifierSVM__class_weight_1={1: 2.5};, score=0.966 total time=29.2min
[CV 2/2; 7/9] END ClassifierSVM__C=5, ClassifierSVM__class_weight_1={1: 1.5};, score=0.965 total time=29.3min
Calling prefit on ClassifierSVM
param_ClassifierSVM__C param_ClassifierSVM__class_weight_1 \ 5 3 {1: 3.0} 4 3 {1: 2.5} 8 5 {1: 3.0} 7 5 {1: 2.5} 6 5 {1: 1.5} 2 1 {1: 3.0} 3 3 {1: 1.5} 1 1 {1: 2.5} 0 1 {1: 1.5} params split0_test_score \ 5 {'ClassifierSVM__C': 3, 'ClassifierSVM__class_... 0.965313 4 {'ClassifierSVM__C': 3, 'ClassifierSVM__class_... 0.965364 8 {'ClassifierSVM__C': 5, 'ClassifierSVM__class_... 0.965114 7 {'ClassifierSVM__C': 5, 'ClassifierSVM__class_... 0.965356 6 {'ClassifierSVM__C': 5, 'ClassifierSVM__class_... 0.965201 2 {'ClassifierSVM__C': 1, 'ClassifierSVM__class_... 0.964449 3 {'ClassifierSVM__C': 3, 'ClassifierSVM__class_... 0.964589 1 {'ClassifierSVM__C': 1, 'ClassifierSVM__class_... 0.964195 0 {'ClassifierSVM__C': 1, 'ClassifierSVM__class_... 0.963823 split1_test_score mean_test_score std_test_score rank_test_score 5 0.965726 0.965519 0.000206 1 4 0.965614 0.965489 0.000125 2 8 0.965626 0.965370 0.000256 3 7 0.965307 0.965332 0.000024 4 6 0.965129 0.965165 0.000036 5 2 0.965158 0.964803 0.000354 6 3 0.964434 0.964512 0.000078 7 1 0.964178 0.964186 0.000009 8 0 0.963234 0.963529 0.000295 9
_y = pipeline_svm.predict(test_x)
____________________________________________________________________________________________________
Predicting pipeline...
****************************************************************************************************
Cached( filter=TWECFilter(deltas_f=['cosine', 'euclidean', 'manhattan', 'chebyshev'], context_size=25), cache_data=True, cache_filter=True, overwrite=False, storage=None )
- El dato XYData(_hash='ed20b892e7858a253df46cdd3d19ef040844625d', _path='TWECFilter/60b322a0bd676ce665f0d6b568a28ef664fef914') Existe, se carga del storage.
DeltaSelectorFilter(deltas_f='cosine')
* Downloading: <_io.BufferedReader name='cache/TWECFilter/60b322a0bd676ce665f0d6b568a28ef664fef914/ed20b892e7858a253df46cdd3d19ef040844625d'>
SklearnOptimizer( scoring='f1_weighted', cv=2, pipeline=F3Pipeline( filters=[ClassifierSVM(proba=False)], metrics=[F1(average='binary')], overwrite=False, store=False, log=False ), n_jobs=-1 )
Evaluation¶
After training the model on the training set using cross-validation, we evaluate its performance on the test set. This comparison is somewhat biased, as it involves predicting the label of individual chunks while evaluating against labels that were propagated from user-level annotations to their corresponding chunks.
pipeline_svm.evaluate(test_x, test_y, _y)
____________________________________________________________________________________________________
Evaluating pipeline......
****************************************************************************************************
{'F1': 0.845859872611465,
'Precission': 0.8736842105263158,
'Recall': 0.8197530864197531,
'ERDE_5': 3.0647720557294345,
'ERDE_50': 0.959133296979966}
If we perform a fairer evaluation by propagating the predictions to the user level—assigning a user as positive if at least one of their chunks is predicted as positive—we observe that the performance remains similar or even improves, which indicates that the system is working as intended.
gg_2023_test["_y"] = _y.value
gg_2023_test.head(5)
| user | chunk | n_texts | text | date | label | _y | |
|---|---|---|---|---|---|---|---|
| 0 | subject1 | 0 | 64 | [Dope, No retcons or changes. The way it was, ... | [2020-08-03 21:33:23, 2020-08-12 16:41:30, 202... | 0 | 0 |
| 1 | subject1 | 1 | 64 | [Where did you get this?, I have no idea how t... | [2020-10-04 22:34:08, 2020-10-04 22:38:55, 202... | 0 | 0 |
| 2 | subject1 | 2 | 64 | [A little something im working on, Tried to do... | [2021-02-10 21:13:06, 2021-02-17 20:34:45, 202... | 0 | 0 |
| 3 | subject1 | 3 | 64 | [Oh the episodes after the characters stories ... | [2021-04-08 22:55:11, 2021-04-08 23:48:15, 202... | 0 | 0 |
| 4 | subject1 | 4 | 64 | [They need to drop easter eggs or hints like t... | [2021-04-25 23:19:00, 2021-04-28 23:10:38, 202... | 0 | 0 |
aux = gg_2023_test.groupby(["user"]).agg(
{"label": "first", "_y": lambda x: 1 if any(list(x)) else 0}
)
aux
| label | _y | |
|---|---|---|
| user | ||
| subject1 | 0 | 0 |
| subject10 | 0 | 0 |
| subject10000 | 0 | 0 |
| subject1001 | 0 | 0 |
| subject1005 | 0 | 0 |
| ... | ... | ... |
| subject9982 | 0 | 0 |
| subject9984 | 0 | 0 |
| subject999 | 0 | 0 |
| subject9990 | 0 | 0 |
| subject9999 | 0 | 0 |
2079 rows × 2 columns
aux.groupby(["label"]).describe()
| _y | ||||||||
|---|---|---|---|---|---|---|---|---|
| count | mean | std | min | 25% | 50% | 75% | max | |
| label | ||||||||
| 0 | 1998.0 | 0.009009 | 0.094511 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
| 1 | 81.0 | 0.975309 | 0.156150 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 |
{
"F1": F1().evaluate(
test_x, XYData.mock(aux.label.tolist()), XYData.mock(aux._y.tolist())
),
"Precission": Precission().evaluate(
test_x, XYData.mock(aux.label.tolist()), XYData.mock(aux._y.tolist())
),
"Recall": Recall().evaluate(
test_x, XYData.mock(aux.label.tolist()), XYData.mock(aux._y.tolist())
),
"ERDE_5": test_erde_5.evaluate(
test_x, XYData.mock(aux.label.tolist()), XYData.mock(aux._y.tolist())
),
"ERDE_50": test_erde_50.evaluate(
test_x, XYData.mock(aux.label.tolist()), XYData.mock(aux._y.tolist())
),
}
{'F1': 0.8876404494382022,
'Precission': 0.8144329896907216,
'Recall': 0.9753086419753086,
'ERDE_5': 3.3773273408026343,
'ERDE_50': 1.8609453827254776}