`is_x86_feature_detected` in Teaclave SGX SDK

Background

Crates often use is_x86_feature_detected to select appropriate implementations (such as AVX/SSE/SSSE/FMA). It triggers cpuid instruction in default libstd implementation on x86_64. We want to avoid such kind of SGX in-compatible instructions and unnecessary AEX events.

Solution

We found that Intel's SDK initializes its optimized libraries in a way of:

initialize a global cpu feature indicator by enclave initialization parameter in urts

//Since CPUID instruction is NOT supported within enclave, we enumerate the cpu features here and send to tRTS.
get_cpu_features(&info.cpu_features);
get_cpu_features_ext(&info.cpu_features_ext);
init_cpuinfo((uint32_t *)info.cpuinfo_table);

Initialize optimized libraries according to the global cpu feature indicator in trts

// optimized libs
if (SDK_VERSION_2_0 < g_sdk_version || sys_features.size != 0)
{
  if (0 != init_optimized_libs(cpu_features, (uint32_t*)sys_features.cpuinfo_table, xfrm))
  {
    return -1;
  }
}

We found that in init_optimized_libs, a global variable g_cpu_feature_indicator is initialized to store the feature_bit_array which contains everything we need!

static int set_global_feature_indicator(uint64_t feature_bit_array, uint64_t xfrm) {
    ......
    g_cpu_feature_indicator = feature_bit_array;
    return 0;
}

Since Rust SGX SDK depends on trts, we can simply re-use the g_cpu_feature_indicator and simulate the is_x86_feature_detected macro easily! First we import the value from trts:

#[link(name = "sgx_trts")]
extern {
    static g_cpu_feature_indicator: uint64_t;
    static EDMM_supported: c_int;
}

#[inline]
pub fn rsgx_get_cpu_feature() -> u64 {
    unsafe { g_cpu_feature_indicator }
}

Then parse g_cpu_feature_indicator like std_detect:

#[macro_export]
macro_rules! is_cpu_feature_supported {
    ($feature:expr) => ( (($feature & $crate::enclave::rsgx_get_cpu_feature()) != 0) )
}

#[macro_export]
macro_rules! is_x86_feature_detected {
    ("ia32") => {
        $crate::cpu_feature::check_for($crate::cpu_feature::Feature::ia32)
    };
    ...
}

Performance concerns

We observed that some crates (such as matrixmultiply) are likely to use the highest level of instructions for speed up. But it may not be the best solution. For example, the "machine-learning" SGX sample depends on rusty-machine and matrixmultiply, which intend to use AVX instruction if supported. However, if we use the "fallback" mode, it'll be about 10x faster than the AVX version. The AVX optimiztion is pretty complicated and I have no time to read Intel's Intel® 64 and IA-32 Architectures Optimization Reference Manual. And I don't think either of crate's owner or llvm backend can optimize it ideally. I recommend to choose the appropirate instruction set per workload.

Background​

Solution​

Performance concerns​

Background

Solution

Performance concerns