File: //usr/libexec/oracle-cloud-agent/plugins/osms/charset_normalizer/api.pyc
a
ٓ�fiR �
@ s� d dl Z d dlmZ d dlmZmZmZmZmZ ddl m
Z
mZmZm
Z
ddlmZmZmZmZ ddlmZ ddlmZmZ dd lmZmZmZmZmZmZmZ e � d
�Z!e �"� Z#e#�$e �%d�� dee&e'f e(e(e)eee* eee* e+e+e)e+ed�dd�Z,dee(e(e)eee* eee* e+e+e)e+ed�dd�Z-d ee*e&ef e(e(e)eee* eee* e+e+e)e+ed�dd�Z.d!eee*ee&f e(e(e)eee* eee* e+e+e)e+e+d�dd�Z/dS )"� N)�PathLike)�BinaryIO�List�Optional�Set�Union� )�coherence_ratio�encoding_languages�mb_encoding_languages�merge_coherence_ratios)�IANA_SUPPORTED�TOO_BIG_SEQUENCE�TOO_SMALL_SEQUENCE�TRACE)�
mess_ratio)�CharsetMatch�CharsetMatches)�any_specified_encoding�cut_sequence_chunks� iana_name�identify_sig_or_bom�
is_cp_similar�is_multi_byte_encoding�should_strip_sig_or_bom�charset_normalizerz)%(asctime)s | %(levelname)s | %(message)s� � 皙�����?TF皙�����?)� sequences�steps�
chunk_size� threshold�cp_isolation�cp_exclusion�preemptive_behaviour�explain�language_threshold�enable_fallback�returnc
/ C s t | ttf�s td�t| ����|r>tj}
t�t � t�
t� t| �}|dkr�t�
d� |rvt�t � t�
|
prtj� tt| dddg d�g�S |dur�t�td d
�|�� dd� |D �}ng }|dur�t�td
d
�|�� dd� |D �}ng }||| k�rt�td|||� d}|}|dk�r:|| |k �r:t|| �}t| �tk }t| �tk}
|�rlt�td�|�� n|
�r�t�td�|�� g }|�r�t| �nd}|du�r�|�|� t�td|� t� }g }g }d}d}d}t� }t| �\}}|du�r|�|� t�tdt|�|� |�d� d|v�r.|�d� |t D �]�}|�rP||v�rP�q6|�rd||v �rd�q6||v �rr�q6|�|� d}||k}|�o�t|�}|dv �r�|�s�t�td|� �q6|dv �r�|�s�t�td|� �q6zt|�}W n, t t!f�y t�td|� Y �q6Y n0 zr|
�r^|du �r^t"|du �rB| dtd�� n| t|�td�� |d� n&t"|du �rn| n| t|�d� |d�}W nb t#t$f�y� } zDt |t$��s�t�td|t"|�� |�|� W Y d}~�q6W Y d}~n
d}~0 0 d}|D ]} t%|| ��r�d} �q�q�|�r*t�td|| � �q6t&|�s6dnt|�|t|| ��}!|�of|du�oft|�|k }"|"�r|t�td |� tt|!�d! �}#t'|#d"�}#d}$d}%g }&g }'z�t(| ||!||||||� D ]|}(|&�|(� |'�t)|(||du �o�dt|� k�o�d"kn �� |'d# |k�r|$d7 }$|$|#k�s4|�r�|du �r� �q>�q�W nB t#�y� } z(t�td$|t"|�� |#}$d}%W Y d}~n
d}~0 0 |%�s|
�r|�sz| td%�d� j*|d&d'� W nR t#�y } z8t�td(|t"|�� |�|� W Y d}~�q6W Y d}~n
d}~0 0 |'�rt+|'�t|'� nd})|)|k�s6|$|#k�r�|�|� t�td)||$t,|)d* d+d,�� | �r6|dd|fv �r6|%�s6t| ||dg |�}*||k�r�|*}n|dk�r�|*}n|*}�q6t�td-|t,|)d* d+d,�� |�s�t-|�}+nt.|�}+|+�rt�td.�|t"|+��� g },|dk�rF|&D ],}(t/|(||+�r2d/�|+�nd�}-|,�|-� �qt0|,�}.|.�rht�td0�|.|�� |�t| ||)||.|�� ||ddfv �r�|)d1k �r�t�
d2|� |�r�t�t � t�
|
� t|| g� S ||k�r6t�
d3|� |�rt�t � t�
|
� t|| g� S �q6t|�dk�r�|�s8|�s8|�rDt�td4� |�rdt�
d5|j1� |�|� nd|�rt|du �s�|�r�|�r�|j2|j2k�s�|du�r�t�
d6� |�|� n|�r�t�
d7� |�|� |�r�t�
d8|�3� j1t|�d � n
t�
d9� |� rt�t � t�
|
� |S ):af
Given a raw bytes sequence, return the best possibles charset usable to render str objects.
If there is no results, it is a strong indicator that the source is binary/not text.
By default, the process will extract 5 blocks of 512o each to assess the mess and coherence of a given sequence.
And will give up a particular code page after 20% of measured mess. Those criteria are customizable at will.
The preemptive behavior DOES NOT replace the traditional detection workflow, it prioritize a particular code page
but never take it for granted. Can improve the performance.
You may want to focus your attention to some code page or/and not others, use cp_isolation and cp_exclusion for that
purpose.
This function will strip the SIG in the payload/sequence every time except on UTF-16, UTF-32.
By default the library does not setup any handler other than the NullHandler, if you choose to set the 'explain'
toggle to True it will alter the logger configuration to add a StreamHandler that is suitable for debugging.
Custom logging format and handler can be set manually.
z4Expected object of type bytes or bytearray, got: {0}r z<Encoding detection on empty bytes, assuming utf_8 intention.�utf_8g F� Nz`cp_isolation is set. use this flag for debugging purpose. limited list of encoding allowed : %s.z, c S s g | ]}t |d ��qS �F�r ��.0�cp� r2 �z/sparta/input/_build_configuration/image_build+validate/lib/bmcenv/lib64/python3.9/site-packages/charset_normalizer/api.py�
<listcomp>[ � zfrom_bytes.<locals>.<listcomp>zacp_exclusion is set. use this flag for debugging purpose. limited list of encoding excluded : %s.c S s g | ]}t |d ��qS r- r. r/ r2 r2 r3 r4 f r5 z^override steps (%i) and chunk_size (%i) as content does not fit (%i byte(s) given) parameters.r z>Trying to detect encoding from a tiny portion of ({}) byte(s).zIUsing lazy str decoding because the payload is quite large, ({}) byte(s).z@Detected declarative mark in sequence. Priority +1 given for %s.zIDetected a SIG or BOM mark on first %i byte(s). Priority +1 given for %s.�ascii> �utf_32�utf_16z\Encoding %s won't be tested as-is because it require a BOM. Will try some sub-encoder LE/BE.> �utf_7zREncoding %s won't be tested as-is because detection is unreliable without BOM/SIG.z2Encoding %s does not provide an IncrementalDecoderg ��A)�encodingz9Code page %s does not fit given bytes sequence at ALL. %sTzW%s is deemed too similar to code page %s and was consider unsuited already. Continuing!zpCode page %s is a multi byte encoding table and it appear that at least one character was encoded using n-bytes.� � ���zaLazyStr Loading: After MD chunk decode, code page %s does not fit given bytes sequence at ALL. %sg j�@�strict)�errorsz^LazyStr Loading: After final lookup, code page %s does not fit given bytes sequence at ALL. %szc%s was excluded because of initial chaos probing. Gave up %i time(s). Computed mean chaos is %f %%.�d � )�ndigitsz=%s passed initial chaos probing. Mean measured chaos is %f %%z&{} should target any language(s) of {}�,z We detected language {} using {}r z.Encoding detection: %s is most likely the one.zoEncoding detection: %s is most likely the one as we detected a BOM or SIG within the beginning of the sequence.zONothing got out of the detection process. Using ASCII/UTF-8/Specified fallback.z7Encoding detection: %s will be used as a fallback matchz:Encoding detection: utf_8 will be used as a fallback matchz:Encoding detection: ascii will be used as a fallback matchz]Encoding detection: Found %s as plausible (best-candidate) for content. With %i alternatives.z=Encoding detection: Unable to determine any suitable charset.)4�
isinstance� bytearray�bytes� TypeError�format�type�logger�level�
addHandler�explain_handler�setLevelr �len�debug�
removeHandler�logging�WARNINGr r �log�join�intr r r �append�setr r
�addr r �ModuleNotFoundError�ImportError�str�UnicodeDecodeError�LookupErrorr �range�maxr r �decode�sum�roundr
r r r r: �fingerprint�best)/r r! r"